Analyzing support for transgenders’ rights across the EU
Survey Research Methodology II Assignment
Loading libraries
rm(list = ls())
library(MASS)
library(performance)
library(tidyverse)
library(haven)
library(readxl)
library(xml2)
library(rvest)
library(janitor)
library(DataExplorer)
library(countrycode)
library(explore)
library(lme4)
library(lmerTest)
library(broom.mixed)
library(ggplot2)
library(e1071)
library(dplyr)
library(scales)
library(tidyr)
library(kableExtra)
library(gmodels)
library(flexplot)
library(corrplot)
library(DescTools)
library(caret)
library(ROSE)
library(randomForest)
library(pROC)
library(mice)
library(glmnet)
library(fastDummies)
library(rempsyc)
library(gbm)Loading data
Country-level datasets
This dataframe serves us to join various country-level datasets together later.
# building a df with country names and codes for the EU-28
codelist <- countrycode::codelist |>
select(country.name.en, iso.name.en, un.name.en, cow.name, ecb, eurostat, iso2c, iso3c, eu28) |>
filter(!is.na(eu28)) GDP per capita
This is GDP per capita, PPP (constant 2021 international $). Retrieved from the World Bank (https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD) for the year 2019.
## # A tibble: 6 × 69
## `Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
## <chr> <chr> <chr> <chr> <lgl> <lgl>
## 1 Aruba ABW GDP per capita,… NY.GDP.PCAP.PP.… NA NA
## 2 Africa Eastern… AFE GDP per capita,… NY.GDP.PCAP.PP.… NA NA
## 3 Afghanistan AFG GDP per capita,… NY.GDP.PCAP.PP.… NA NA
## 4 Africa Western… AFW GDP per capita,… NY.GDP.PCAP.PP.… NA NA
## 5 Angola AGO GDP per capita,… NY.GDP.PCAP.PP.… NA NA
## 6 Albania ALB GDP per capita,… NY.GDP.PCAP.PP.… NA NA
## # ℹ 63 more variables: `1962` <lgl>, `1963` <lgl>, `1964` <lgl>, `1965` <lgl>,
## # `1966` <lgl>, `1967` <lgl>, `1968` <lgl>, `1969` <lgl>, `1970` <lgl>,
## # `1971` <lgl>, `1972` <lgl>, `1973` <lgl>, `1974` <lgl>, `1975` <lgl>,
## # `1976` <lgl>, `1977` <lgl>, `1978` <lgl>, `1979` <lgl>, `1980` <lgl>,
## # `1981` <lgl>, `1982` <lgl>, `1983` <lgl>, `1984` <lgl>, `1985` <lgl>,
## # `1986` <lgl>, `1987` <lgl>, `1988` <lgl>, `1989` <lgl>, `1990` <dbl>,
## # `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …
gdp_data <- gdp_data |>
select(`Country Name`, `Country Code`, `2019`)
gdp_data <- gdp_data |>
rename(country_name = `Country Name`,
country_code = `Country Code`,
gdp_pc_ppp = `2019`)
gdp_data <- gdp_data |>
inner_join(select(codelist, iso3c), by = c("country_code" = "iso3c"))
str(gdp_data)## tibble [28 × 3] (S3: tbl_df/tbl/data.frame)
## $ country_name: chr [1:28] "Austria" "Belgium" "Bulgaria" "Cyprus" ...
## $ country_code: chr [1:28] "AUT" "BEL" "BGR" "CYP" ...
## $ gdp_pc_ppp : num [1:28] 64630 60452 27673 46157 47720 ...
We now have 28 observations including the UK.
LGBT rights
Found this LGBT rights index on Our World in Data (https://ourworldindata.org/grapher/lgbt-rights-index) which captures whether LGBT+ people enjoy the same rights as cisgender people combining information on 18 different policies. It includes the legal status of same-sex marriage.
lgbt_rights_index <- read.csv("data/lgbt-rights-index.csv")
lgbt_rights_index <- lgbt_rights_index |>
filter(Year == 2019) |>
inner_join(select(codelist, iso3c), by = c("Code" = "iso3c")) |>
select(-Year) # all observations are from 2019
str(lgbt_rights_index)## 'data.frame': 28 obs. of 3 variables:
## $ Entity : chr "Austria" "Belgium" "Bulgaria" "Croatia" ...
## $ Code : chr "AUT" "BEL" "BGR" "HRV" ...
## $ LGBT..Policy.Index: num 8.92 10 4.94 6.96 3.95 ...
Column names can be renamed:
Gender inequality index
Gender Development and Gender Inequality indexes, developed by the United Nations Development. Retrieved from https://hdr.undp.org/data-center/documentation-and-downloads for the year 2019.
gender_index <- read_xlsx("data/UNDP_gender_indexes.xlsx")
gender_index <- gender_index |>
select(-dimension, -note, -year) |> # empty/useless columns
inner_join(select(codelist, iso3c), by = c("countryIsoCode" = "iso3c"))
gender_index |>
distinct(country) |>
nrow()## [1] 28
## tibble [560 × 7] (S3: tbl_df/tbl/data.frame)
## $ countryIsoCode: chr [1:560] "AUT" "AUT" "AUT" "AUT" ...
## $ country : chr [1:560] "Austria" "Austria" "Austria" "Austria" ...
## $ indexCode : chr [1:560] "GII" "GDI" "GDI" "GDI" ...
## $ index : chr [1:560] "Gender Inequality Index" "Gender Development Index" "Gender Development Index" "Gender Development Index" ...
## $ indicatorCode : chr [1:560] "abr" "eys_f" "eys_m" "gdi" ...
## $ indicator : chr [1:560] "Adolescent Birth Rate (births per 1,000 women ages 15-19)" "Expected Years of Schooling, female (years)" "Expected Years of Schooling, male (years)" "Gender Development Index (value)" ...
## $ value : num [1:560] 5.499 16.451 15.725 0.969 0.053 ...
We checked that we have indeed 28 distinct countries in the dataset.
This dataset collects many indicators apart from the indexes values. We are selecting only the Gender Development and Gender Inequality indexes (GDI and GII):
gender_index <- gender_index |>
filter(indicatorCode %in% c("gdi", "gii")) |>
select(-c(indexCode, indicatorCode, indicator)) |> # removing redundant columns
pivot_wider(names_from = index, values_from = value)
colnames(gender_index) <- janitor::make_clean_names(colnames(gender_index)) # cleaning spaces and upper casesAnd then exploring them to see which one is a better fit for modeling and predictions:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0160 0.0555 0.0890 0.1067 0.1335 0.2390
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9560 0.9725 0.9855 0.9856 0.9918 1.0230
## [1] 0.06664423
## [1] 0.0167627
We observe that the second one (Gender Development Index) has a very small range, indicating that most countries have nearly identical scores. This is supported by the standard deviation, which shows that the GII is more spread out than the GDI. Being almost constant, GDI won’t add much value to the analysis. This is likely due to the fact that these indexes are created by the United Nations Development Programme for all countries, so it may not capture the finer differences between EU countries.
cor(gender_index$gender_inequality_index,
gender_index$gender_development_index,
use = "complete.obs")## [1] 0.1750589
Surprisingly, the two indexes have a very weak positive correlation. While they are not inverse, we interpreted them as measuring opposite things, and had expected a negative correlation. Maybe the correlation coefficient is not useful here because of the low variation of both variables (specially GDI), so ultimately we chose to keep GII and discard GDI.
Economist’s Democracy Index
democracy_index <- read_xlsx("data/EIU_democracy_index.xlsx", sheet = 4)
# the ISO codes were lowercase which impedes the join
democracy_index$geo <- toupper(democracy_index$geo)
# filter for 2019 and EU28 countries
democracy_index <- democracy_index |>
filter(time == 2019) |>
inner_join((select(codelist, iso3c)), by = c("geo" = "iso3c"))
# clean var names
names(democracy_index) <- names(democracy_index) %>%
janitor::make_clean_names() %>%
gsub("_eiu$", "", .)
democracy_index <- democracy_index |>
rename(country_code = geo,
country_name = name,
year = time)
str(democracy_index)## tibble [28 × 10] (S3: tbl_df/tbl/data.frame)
## $ country_code : chr [1:28] "AUT" "BEL" "BGR" "HRV" ...
## $ country_name : chr [1:28] "Austria" "Belgium" "Bulgaria" "Croatia" ...
## $ year : num [1:28] 2019 2019 2019 2019 2019 ...
## $ democracy_index : num [1:28] 82.9 76.4 70.3 65.7 75.9 76.9 92.2 79 92.5 81.2 ...
## $ electoral_pluralism_index : num [1:28] 95.8 95.8 91.7 91.7 91.7 95.8 100 95.8 100 95.8 ...
## $ government_index : num [1:28] 78.6 82.1 64.3 60.7 64.3 67.9 92.9 78.6 89.3 78.6 ...
## $ political_participation_index: num [1:28] 83.3 50 72.2 55.6 66.7 66.7 83.3 66.7 88.9 77.8 ...
## $ political_culture_index : num [1:28] 68.8 68.8 43.8 50 68.8 68.8 93.8 68.8 87.5 68.8 ...
## $ civil_liberties_index : num [1:28] 88.2 85.3 79.4 70.6 88.2 85.3 91.2 85.3 97.1 85.3 ...
## $ change_in_democracy_index : num [1:28] 0 -1.4 0 0 0 ...
In this case, we are only keeping the overall index value.
Joining all country-level data together
country_level_data <- codelist |>
select(iso3c, iso2c) |>
left_join(gdp_data, by = c("iso3c" = "country_code")) |>
# left_join(rural_data, by = c("iso3c" = "country_code")) |> rural population data will probably be discarded
left_join(gender_index, by = c("iso3c" = "country_iso_code")) |>
left_join(lgbt_rights_index, by = c("iso3c" = "country_code")) |>
left_join(democracy_index, by = c("iso3c" = "country_code")) |>
select(-contains("country_")) # removing all duplicated country_name columns that were joined from each data frame
str(country_level_data)## tibble [28 × 7] (S3: tbl_df/tbl/data.frame)
## $ iso3c : chr [1:28] "AUT" "BEL" "BGR" "HRV" ...
## $ iso2c : chr [1:28] "AT" "BE" "BG" "HR" ...
## $ gdp_pc_ppp : num [1:28] 64630 60452 27673 35094 46157 ...
## $ country : chr [1:28] "Austria" "Belgium" "Bulgaria" "Croatia" ...
## $ gender_inequality_index: num [1:28] 0.053 0.048 0.205 0.119 0.235 0.124 0.016 0.092 0.031 0.086 ...
## $ lgbt_policy_index : num [1:28] 8.92 10 4.94 6.96 3.95 ...
## $ democracy_index : num [1:28] 82.9 76.4 70.3 65.7 75.9 76.9 92.2 79 92.5 81.2 ...
Data cleaning
Selecting relevant variables
We are discarding all variables that relate to trade and
globalization as deemed not relevant for our analysis qa.
We are also discarding variables related to energy policies
qb.
We have also not considered questions specifically about Roma ex
qc8 and qc14 and qc16
Some of the doubts we had:
AGE
x_df <- data_raw |> summarise(mean = mean(qc19), .by = d11)
ggplot(x_df, aes(x = d11, y = mean)) + geom_point()As relationship seems pretty linear we are going to use the continuous variable for age.
POLITICAL OPINIONS
x_df <- data_raw |> summarise(mean = mean(qc19), .by = d1)
ggplot(x_df, aes(x = d1, y = mean)) + geom_point() + xlim(1,10)## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
For political opinions it seems that the relationship is less linear so we will work on the categories variable. Indeed, there seems to be a clear group (1to4) which is the same used in the variable with 3 categories (left, center and right so we will use the 3-categories variable.
MARRIAGE
x_df <- data_raw |> summarise(mean = mean(qc19), .by = d7r1)
ggplot(x_df, aes(x = d7r1, y = mean)) + geom_point() + xlim(1,5)## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
Of all the possible combinations this one seems the best to highlight differences in our target variable
NATIONALITY
There is no easy way to create an immigrant dummy by checking whether
someone does not have the nationality of the country in which he is
being interviewed for the survey. So we are just going to use
q1_29 as a dummy for whether someone owns a non-EU
nationality
POLITICAL INTEREST
Using only polintr as a summary of the whole
d71 question about how strong is your interest in politics
in various domains
EXPERIENCED DISCRIMINATION
For the questions about whether you have experienced discrimination
qc2_ I am keeping only qc2_15 which is a
binary on whether you have experienced any discrimination or not and I
am not keeping the other categories that allowed us to understand if you
had experienced discrimination on the basis of a particular motive. We
can probably assume that if a person is a part of a specific minority
(info from sd2) and is being discriminated is because of
being part of that minority (at least in most cases), therefore it felt
like information we already had (and which we can easily put in our
regression using interaction effects)
PERCIEVED DISCRIMINATION in country
This is a question (qc1) about how widespread you
perceived discrimination is in your country. It is a rather peculiar
question because it asks individuals for perception at the country
level. I think it might be more useful (by aggregating results by
country) to use it to build a country level indicator of perceived
discrimination against certain minorities that are of our interest
rather than use it as an individual level variable. So I will keep them
out of the dataset for now
For the same reason I am also discarding qc4 that asks
whether in your country you feel like candidates with certain
characteristics would be at a disadvantage in the recruitment process.
It still basically asks about perceived discrimination at the country
level.
And the same reasoning also applies to qc7 which asks if
efforts towards reducing discrimination are effective in your
country.
HOW DISCRIMINATORY ARE YOU SCORE
I use qc12 and qc13 to build a score from 1
to 10 of how discriminatory are you against certain minorities. I am
building the score for each minority and then we can decide later if
some are irrelevant.
Note for some categories this score is a bit stupid (ex. voting for whether you would feel uncomfortable if your child was in a love relationship with a old person would give you a high racist score against old people, this is an example of some of the categories for which it is worth excluding the index)
We can also further aggregate to obtain just one score (or obtain religious/ethnic score etc.).
qc6 also asks about discriminatory behaviors, but is
about elected public official, it does not ask about a situation that
impacts you personally. It also uses different categories for minorities
therefore I prefer to use qc12 and qc13 rather
than qc6.
SUPPORTIVE OF LGBTQ RIGHTS INDEX
Using qc15 we can again build an index for how
supportive of lgbtq rights a person is.
I again choose to take the mean across the different answers but we could also use median/mode if we think is better
We could also use qc18 to try to capture anti lgbtq
sentiment, but I preferred qc15 as it seemed more
straightforward (I therefore deleted qc18, but we can put
it back if useful).
MY VOICE COUNTS
I guess that people that feel alienated from society and politics
might be less likely to support lgbtq rights so I took the mean of
d72 question which asked whether you felt like your voice
mattered.
OTHERS
qc11 is a strange question feels useless to me, I
removed it.
I also removed qc9 as I don’t really know what to do
with it and qc17.
data <- data_raw |>
select(serialid, # unique identifier
isocntry, # 2 digit country code
d11, # age variables
q1_29, # nationality of interviewee. options given: EU28+Other+DK, using only other = outside of EU
d70, #life satisfaction
polintr, # political interest index (summarizes d71 questions)
starts_with("sd1"), # friends that are minority groups
starts_with("sd2"), # are you part of a minority
sd3, # religion
qc2_15, # experienced discrimination yourself
qc3, # where discrimination took place
starts_with("qc5"), # actions against discrimination
qc10, # how would you report discrimination
starts_with("qc12"), # feelings about colleagues being minority
starts_with("qc13"), # feelings about kid being in a relationship with minority
starts_with("qc15"), # opinions about lgbtqi
qc19, # target variable transgender
qc20, # non-binary genders in documents
d1r1, # political ideology
d7r1, # marital status
d10, # gender binary
d8, # eduaction
d15a_r2, # current occupation (discarded previous occupation d15b)
d25, # rural vs urban
d43t, # phones availiabilty
d60, # financial stress (paying bills)
netuse, # internet index
d63, # social class
starts_with("d72"), # my voice counts
)
paradata <- data_raw |>
select(serialid, # to match it with the other data
p2, p3, p3r, p4, p5) # paradata)Removing some of the previously selected variables
sd2_7 to sd2_10, these are possible answer
deemed irrelevant to the question about yourself being part of the
following minorities: sd2_7 other minorities (I don’t know
what other relevant minorities could be there), sd2_8 not
part of minorities (can be deducted from the rest), sd2_9
refusal to respond, sd2_10 don’t know answers,
sd2t summary binary variable for being part of any minority
(we are going to keep the more specific one)
I also remove qc12_nr as I prefer to work on the full
variable instead of the recoded version with less categories. Same for
qc13 and qc18
Check overall data quality
All columns are numeric columns, the only one which is not is
isocntry:
## # A tibble: 27,438 × 1
## isocntry
## <chr>
## 1 BE
## 2 BE
## 3 BE
## 4 BE
## 5 BE
## 6 BE
## 7 BE
## 8 BE
## 9 BE
## 10 BE
## # ℹ 27,428 more rows
Most columns do not have explicit NAs:
Exploiting the fact that the .dta files has attributes (labels) for all its columns.
If we search for attr(colname, "label") you get back the
original long name of the variable
If we instead search for attr(colname, "labels") you get
back all the possible encoding levels of the variable (ex.1,2,3,4 etc.)
and by doing names(attr(colname, "labels")) you get back
the actual meaning of those numbers (ex. 1 = “Yes”, 2 = “No”, etc.)
Will now look into the labels for the different levels that our factor variables can take:
## Very satisfied Fairly satisfied Not very satisfied
## 1 2 3
## Not at all satisfied DK
## 4 5
## [1] "Very satisfied" "Fairly satisfied" "Not very satisfied"
## [4] "Not at all satisfied" "DK"
## # A tibble: 5 × 2
## name_labels labels
## <chr> <dbl>
## 1 Very satisfied 1
## 2 Fairly satisfied 2
## 3 Not very satisfied 3
## 4 Not at all satisfied 4
## 5 DK 5
# Create a list of tibbles containing the labels and their associated name for each variable
list_label_tibbles <-
#Applies a function across all columns of a df and returns results as a list
lapply(names(data), function(col_name) {
labels <- attr(data[[col_name]], "labels") # Extract labels
name_labels <- names(labels) # Extract label names
# Create tibble with the extracted data only if labels exist
if (!is.null(labels)) {
tibble(name_labels = name_labels, labels = labels)}
else {NULL} # Returns a NULL element for columns without labels
})
# Giving to each element of the list as name the name of the variable
list_label_tibbles <- setNames(list_label_tibbles, names(data))
# For example
list_label_tibbles$d70## # A tibble: 5 × 2
## name_labels labels
## <chr> <dbl>
## 1 Very satisfied 1
## 2 Fairly satisfied 2
## 3 Not very satisfied 3
## 4 Not at all satisfied 4
## 5 DK 5
Right column is what appears in our data (as a number). Left column is the label that we must assign to that number when we factorize
Cleaning
Initial cleaning and pre-processing
Recoding together Germany East and West because we are running analysis at the country level and they are coded separately.
## [1] "BE" "DK" "GR" "ES" "FI" "FR" "IE" "IT" "LU" "NL"
## [11] "AT" "PT" "SE" "DE-W" "DE-E" "GB" "BG" "CY" "CZ" "EE"
## [21] "HU" "LV" "LT" "MT" "PL" "RO" "SK" "SI" "HR"
data <- data |>
mutate(isocntry = case_when(
isocntry %in% c("DE-W", "DE-E") ~ "DE",
TRUE ~ isocntry))
# We might want to join the full name of the countries using the codelist dfSeparating variables for which I can use the attribute labels to factorize them and the variables for which this strategy cannot be used
data <- data |>
rename(friends_trans = sd1_7)
non_factor_variables <- c("serialid", "tnscntry", "isocntry", "d11", "q1_29",
names(data)[startsWith(names(data), "sd1_")],
names(data)[startsWith(names(data), "sd2_")],
names(data)[startsWith(names(data), "qc5_")],
names(data)[startsWith(names(data), "qc12_")],
names(data)[startsWith(names(data), "qc13_")],
names(data)[startsWith(names(data), "qc15_")],
names(data)[startsWith(names(data), "d72_")],
"d8", "opls")
factor_variables <- setdiff(names(data), non_factor_variables)Correctly encoding the factor variables that do not need any further cleaning
# Converting them to factors and assign them their labels automatically
data <- data |>
mutate(across(all_of(factor_variables), labelled::to_factor))
# Turning DK into NAs for all the factor variables
data <- data |>
mutate(across(all_of(factor_variables), ~ fct_na_level_to_value(., extra_levels = "DK")))
# Converting to numeric the variables that should be numeric
data <- data |>
mutate(age = as.numeric(d11),
years_edu = as.numeric(d8)) |>
# Recoding correctly d8 education variable according to unique(data_raw$d11))
mutate(years_edu = case_when(
years_edu %in% c(0, 99) ~ NA, # Refusal and DK as NAs
years_edu == 97 ~ 0, # No full time education = 0
years_edu == 98 ~ age, # still studying = age
TRUE ~ years_edu)) |>
select(-c(d11, d8))
# Converting q1_29 to factor without assigning labels (1 if non-Eu nationality, 0 if EU nationality)
# Same for sd2_
data <- data |>
mutate(nonEU_national = as.factor(q1_29),
across(starts_with("sd2_"), ~ as.factor(.x))) |>
select(-q1_29)Dealing with the other variables on which we do some further pre-processing (feature engineering).
Creating a variable that counts the number of different minority groups a person has acquaintances with:
data <- data |>
mutate(across(starts_with("sd1"), ~ if_else(.x == 1, 1, 0))) |>
mutate(n_friends_minorities = sd1_1+sd1_2+sd1_3+sd1_4+sd1_5+sd1_6+sd1_8) |>
relocate(n_friends_minorities, .before="sd1_1") |>
select(-starts_with("sd1"))Creating a variable that counts the number of actions against discrimination that you have taken in the last year:
data <- data |>
mutate(across(starts_with("qc5"), ~ if_else(.x == 1, 1, 0))) |>
mutate(n_actions_against_discri = qc5_1+qc5_2+qc5_3+qc5_4) |>
relocate(n_actions_against_discri, .after="qc3") |>
select(-starts_with("qc5"))Building a discriminatory score:
data <- data |>
# Coding as NAs "it depends" and "DK"
mutate(across(starts_with("qc12"), ~ if_else(.x >= 12, NA, .x))) |>
mutate(across(starts_with("qc13"), ~ if_else(.x >= 12, NA, .x))) |>
# Coding as 5 responses = indifferent
mutate(across(starts_with("qc12"), ~ if_else(.x == 11, 5, .x))) |>
mutate(across(starts_with("qc13"), ~ if_else(.x == 11, 5, .x))) |>
# Modifying such that higher is more discriminatory
mutate(roma_discri = 11 - rowMeans(cbind(qc12_1, qc13_1), na.rm = TRUE),
black_discri = 11 - rowMeans(cbind(qc12_2, qc13_2), na.rm = TRUE),
asian_discri = 11 - rowMeans(cbind(qc12_3, qc13_3), na.rm = TRUE),
white_discri = 11 - rowMeans(cbind(qc12_4, qc13_4), na.rm = TRUE),
jewish_discri = 11 - rowMeans(cbind(qc12_5, qc13_5), na.rm = TRUE),
muslim_discri = 11 - rowMeans(cbind(qc12_6, qc13_6), na.rm = TRUE),
buddihst_discri = 11 - rowMeans(cbind(qc12_7, qc13_7), na.rm = TRUE),
christian_discri = 11 - rowMeans(cbind(qc12_8, qc13_8), na.rm = TRUE),
atheist_discri = 11 - rowMeans(cbind(qc12_9, qc13_9), na.rm = TRUE),
lgb_discri = 11 - rowMeans(cbind(qc12_10, qc13_10), na.rm = TRUE),
trans_discri = 11 - rowMeans(cbind(qc12_11, qc13_11), na.rm = TRUE),
intersex_discri = 11 - rowMeans(cbind(qc12_12, qc13_12), na.rm = TRUE),
disability_discri = 11 - rowMeans(cbind(qc12_13, qc13_13), na.rm = TRUE),
young_discri = 11 - rowMeans(cbind(qc12_14, qc13_14), na.rm = TRUE),
old_discri = 11 - rowMeans(cbind(qc12_15, qc13_15), na.rm = TRUE)) |>
select(-starts_with("qc12")) |>
select(-starts_with("qc13"))I am deleting the discrimination index against young and old people. This is because one of the question is: how comfortable you would feel if one of your children was in a love relationship with a person from group x. It is totally acceptable that people would not feel comfortable with their kid dating an old person without that meaning that they are being discriminatory against old people.
Building a score of how supportive or anti lbtq rights you are:
data <- data |>
mutate(across(starts_with("qc15"), ~ if_else(.x == 5, NA, .x))) |>
mutate(antilgbtq_rights = round(rowMeans(cbind(qc15_1, qc15_2, qc15_3), na.rm = TRUE), 2)) |>
select(-starts_with("qc15"))
# Scale of 1 to 4, 1 = supportive, 4 = homophobicAggregating together questions on whether you think your voice counts:
data <- data |>
mutate(across(starts_with("d72"), ~ if_else(.x > 4, NA, .x))) |>
mutate(social_alienation = rowMeans(cbind(d72_1, d72_2), na.rm = TRUE)) |>
select(-starts_with("d72"))
# The higher the more people think their voice does not matterThis should be our final selection of variables.
We still need to rename them appropriately and check that all the NAs are correctly encoded by looking at the summary.
Further cleaning
Rename columns appropriately
data <- data |>
rename(
country = isocntry,
life_sat = d70,
ethnic_minority = sd2_1,
skincolor_minority = sd2_2,
religious_minority = sd2_3,
roma_minority = sd2_4,
sexual_minority = sd2_5,
disability_minority = sd2_6,
religion = sd3,
disc = qc2_15,
disc_where = qc3,
disc_contact = qc10,
trans_docs = qc19,
gender_docs = qc20,
left_right = d1r1,
marital_status = d7r1,
gender = d10,
occupation = d15a_r2,
community = d25,
phone_access = d43t,
bill_issues = d60,
internet_use = netuse,
social_class = d63
) Now, we will transform the variable disc, a variable
storing whether you have been subject to discrimination, which was
originally coded in a negative way (1 = “Not mentioned”, 2 = “No, you
haven’t been discriminated against”), into a positive dummy variable to
make its interpretation more straightforward. The variable is 1 if a
person has been subject to discrimination:
data <- data |>
mutate(suffered_discr = as.factor(ifelse(disc == "Not mentioned", 1, 0))) |>
select(-disc) |>
relocate(suffered_discr, .before = disc_where)Move the variables around to have a ordered dataframe, and assign labels to help interpretation:
# Relocating to have a ordered df
data <- data |>
relocate(c("age", "gender", "years_edu","community", "marital_status", "occupation", "social_class", "religion", "nonEU_national", "phone_access", "bill_issues", "internet_use"), .after = country) |>
relocate(c("left_right", "social_alienation"), .after = polintr) |>
relocate(c("friends_trans", "n_friends_minorities", "n_actions_against_discri"), .after = gender_docs)
# Assigning labels to columns which have a difficult meaning
attr(data$gender, "label") <- NULL
attr(data$nonEU_national, "label") <- "OWNS A NON-EU NATIONALITY"
attr(data$social_alienation, "label") <- "HIGHER -> THINK THEIR VOICE DOESN'T MATTER"
attr(data$ethnic_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$skincolor_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$religious_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$roma_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$sexual_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$disability_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$disability_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$disability_minority, "label") <- "ARE YOU PART OF X MINORITY"
attr(data$suffered_discr, "label") <- "HAVE YOU BEEN SUBJECT TO DISCRIMINATION"
attr(data$n_friends_minorities, "label") <- "YOU KNOW PEOPLE FROM # NUMBER OF DIFFERENT MINORITES"
attr(data$n_actions_against_discri, "label") <- "YOU HAVE DONE # NUMBER OF DIFFERENT ACTIONS TO FIGHT DISCRIMINATIONS"
attr(data$roma_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$black_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$asian_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$white_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$jewish_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$muslim_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$buddihst_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$christian_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$atheist_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$atheist_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$lgb_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$trans_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$intersex_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$disability_discri, "label") <- "HOW DISCRIMINATORY ARE YOU AGAINST X"
attr(data$antilgbtq_rights, "label") <- "HIGHER -> THEY OPPOSE MORE RIGHTS TO LGBTQ"This is our dataset so far:
## serialid country age gender
## Min. : 1 Length:27438 Min. :15.00 Man :12492
## 1st Qu.: 6860 Class :character 1st Qu.:37.00 Woman:14946
## Median :13720 Mode :character Median :53.00
## Mean :13720 Mean :51.56
## 3rd Qu.:20579 3rd Qu.:66.00
## Max. :27438 Max. :98.00
##
## years_edu community
## Min. : 0.00 Rural area or village : 8776
## 1st Qu.:17.00 Small or middle sized town:10767
## Median :18.00 Large town : 7881
## Mean :19.66 NA's : 14
## 3rd Qu.:22.00
## Max. :90.00
## NA's :387
## marital_status
## (Re-)Married (1-4 in d7) :14673
## Single living with partner (5-8 in d7): 3321
## Single (9-10 in d7) : 4314
## Divorced or separated (11-12 in d7) : 2243
## Widow (13-14 in d7) : 2712
## Other (SPONT.) : 107
## Refusal (SPONT.) : 68
## occupation
## Retired (4 in d15a) :8791
## Manual workers (15 to 18 in d15a) :5883
## Other white collars (13 or 14 in d15a):3536
## Managers (10 to 12 in d15a) :2911
## Self-employed (5 to 9 in d15a) :1979
## Students (2 in d15a) :1676
## (Other) :2662
## social_class religion
## The middle class of society :13068 Catholic :11198
## The working class of society : 7233 Orthodox Christian : 4016
## The lower middle class of society: 4070 Non believer or agnostic: 3694
## The upper middle class of society: 1889 Protestant : 3031
## None (SPONTANEOUS) : 229 Atheist : 2109
## (Other) : 389 (Other) : 3220
## NA's : 560 NA's : 170
## nonEU_national phone_access bill_issues
## 0:27316 Mobile only :14555 Most of the time : 2054
## 1: 122 Landline only : 802 From time to time : 6538
## Landline & mobile:11576 Almost never/never:18467
## No telephone : 505 Refusal (SPONT.) : 379
##
##
##
## internet_use life_sat
## Everyday/Almost everyday :19900 Very satisfied : 7242
## Two or three times a week : 1701 Fairly satisfied :15356
## About once a week : 434 Not very satisfied : 3736
## Two or three times a month: 164 Not at all satisfied: 1002
## Less often : 350 NA's : 102
## Never/No access : 4311
## No Internet access at all : 578
## polintr left_right social_alienation ethnic_minority
## Strong : 4644 (1 - 4) Left :7082 Min. :1.000 0:26603
## Medium :13701 (5 - 6) Centre:9313 1st Qu.:1.500 1: 835
## Low : 4387 (7 -10) Right :6354 Median :2.000
## Not at all: 4706 DK/Refusal :4689 Mean :2.329
## 3rd Qu.:3.000
## Max. :4.000
## NA's :1000
## skincolor_minority religious_minority roma_minority sexual_minority
## 0:26923 0:26416 0:27001 0:27006
## 1: 515 1: 1022 1: 437 1: 432
##
##
##
##
##
## disability_minority suffered_discr
## 0:26756 0:23150
## 1: 682 1: 4288
##
##
##
##
##
## disc_where
## In a public space : 893
## At work : 788
## When looking for a job : 642
## At a café, restaurant, bar or nightclub : 322
## By healthcare personnel (e.g. a receptionist, nurse or doctor): 308
## (Other) : 1054
## NA's :23431
## disc_contact
## The police :8769
## A friend or family member :5386
## An equalities body or ombudsman (SPECIFY THE NAME ACCORDING TO THE COUNTRY):3795
## A lawyer :2329
## Courts :1225
## (Other) :3601
## NA's :2333
## trans_docs gender_docs friends_trans n_friends_minorities
## Yes :14463 Yes :10856 Yes : 2644 Min. :0.000
## No : 9695 No :13395 No :23520 1st Qu.:1.000
## NA's: 3280 NA's: 3187 Refusal (SPONTANEOUS): 306 Median :3.000
## NA's : 968 Mean :3.029
## 3rd Qu.:5.000
## Max. :7.000
##
## n_actions_against_discri roma_discri black_discri asian_discri
## Min. :0.0000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.:0.0000 1st Qu.: 1.500 1st Qu.: 1.000 1st Qu.: 1.000
## Median :0.0000 Median : 4.500 Median : 3.000 Median : 2.500
## Mean :0.3855 Mean : 4.631 Mean : 3.579 Mean : 3.387
## 3rd Qu.:0.0000 3rd Qu.: 7.000 3rd Qu.: 5.500 3rd Qu.: 5.500
## Max. :4.0000 Max. :10.000 Max. :10.000 Max. :10.000
## NA's :757 NA's :542 NA's :592
## white_discri jewish_discri muslim_discri buddihst_discri
## Min. : 1.000 Min. : 1.00 Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 1.00 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 1.000 Median : 2.00 Median : 4.000 Median : 3.000
## Mean : 1.785 Mean : 3.18 Mean : 4.363 Mean : 3.576
## 3rd Qu.: 2.000 3rd Qu.: 5.00 3rd Qu.: 6.500 3rd Qu.: 5.500
## Max. :10.000 Max. :10.00 Max. :10.000 Max. :10.000
## NA's :403 NA's :603 NA's :662 NA's :761
## christian_discri atheist_discri lgb_discri trans_discri
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 1.000 Median : 1.500 Median : 4.000 Median : 5.000
## Mean : 1.904 Mean : 2.802 Mean : 4.399 Mean : 4.934
## 3rd Qu.: 2.000 3rd Qu.: 4.000 3rd Qu.: 6.500 3rd Qu.: 7.500
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## NA's :414 NA's :517 NA's :536 NA's :1081
## intersex_discri disability_discri antilgbtq_rights
## Min. : 1.000 Min. : 1.000 Min. :1.000
## 1st Qu.: 1.500 1st Qu.: 1.000 1st Qu.:1.000
## Median : 5.000 Median : 2.500 Median :2.000
## Mean : 4.825 Mean : 2.977 Mean :2.147
## 3rd Qu.: 7.500 3rd Qu.: 4.500 3rd Qu.:3.000
## Max. :10.000 Max. :10.000 Max. :4.000
## NA's :1331 NA's :585 NA's :791
Our dataset seems quite balanced across the different countries
## # A tibble: 28 × 2
## country n
## <chr> <int>
## 1 AT 1027
## 2 BE 1028
## 3 BG 1032
## 4 CY 503
## 5 CZ 1008
## 6 DE 1537
## 7 DK 1004
## 8 EE 1003
## 9 ES 1005
## 10 FI 1003
## # ℹ 18 more rows
Although some levels of occupation have more observations than
others, we have enough observations for each level so we do not need to
aggregate over different levels for the variable
occupation
## # A tibble: 8 × 2
## occupation n
## <fct> <int>
## 1 Self-employed (5 to 9 in d15a) 1979
## 2 Managers (10 to 12 in d15a) 2911
## 3 Other white collars (13 or 14 in d15a) 3536
## 4 Manual workers (15 to 18 in d15a) 5883
## 5 House persons (1 in d15a) 1358
## 6 Unemployed (3 in d15a) 1304
## 7 Retired (4 in d15a) 8791
## 8 Students (2 in d15a) 1676
Very few people declare themselves to be the higher class of society, nonetheless we keep this level because it makes sense.
## # A tibble: 9 × 2
## social_class n
## <fct> <int>
## 1 The working class of society 7233
## 2 The lower middle class of society 4070
## 3 The middle class of society 13068
## 4 The upper middle class of society 1889
## 5 The higher class of society 157
## 6 Other (SPONTANEOUS) 59
## 7 None (SPONTANEOUS) 229
## 8 Refusal (SPONTANEOUS) 173
## 9 <NA> 560
Given the low number of observations in some categories of the
variable religion and the high number of categories we opt
to aggregate some of them, otherwise we risk too much noise in our
models.
## # A tibble: 16 × 2
## religion n
## <fct> <int>
## 1 Catholic 11198
## 2 Orthodox Christian 4016
## 3 Protestant 3031
## 4 Other Christian 1183
## 5 Jewish 58
## 6 Muslim - Shia 77
## 7 Muslim - Sunni 178
## 8 Other Muslim 137
## 9 Sikh 13
## 10 Buddhist 58
## 11 Hindu 32
## 12 Atheist 2109
## 13 Non believer or agnostic 3694
## 14 Other 1171
## 15 Refusal (SPONTANEOUS) 313
## 16 <NA> 170
# Checking if the means of groups we are going to aggregate are similar
data |>
mutate(trans_docs=as.numeric(trans_docs)) |>
summarise(mean = mean(trans_docs, na.rm=TRUE), .by = religion) |>
arrange(mean)## # A tibble: 16 × 2
## religion mean
## <fct> <dbl>
## 1 Sikh 1.25
## 2 Atheist 1.27
## 3 Non believer or agnostic 1.28
## 4 Protestant 1.30
## 5 Buddhist 1.33
## 6 Other 1.37
## 7 Jewish 1.38
## 8 Other Christian 1.43
## 9 Catholic 1.43
## 10 Hindu 1.44
## 11 Muslim - Shia 1.45
## 12 <NA> 1.45
## 13 Other Muslim 1.49
## 14 Refusal (SPONTANEOUS) 1.49
## 15 Muslim - Sunni 1.54
## 16 Orthodox Christian 1.58
We are going to group together atheist with agnostic. We are also going to put sikh, buddhists, jewish and hindu into the other category because there are only very few observations. Finally we are going to group together all muslims.
data <- data |>
mutate(religion = fct_collapse(religion,
"Non-believers" = c("Atheist", "Non believer or agnostic"),
"Other" = c("Sikh", "Buddhist", "Jewish", "Hindu", "Other"),
"Muslim" = c("Muslim - Shia", "Muslim - Sunni", "Other Muslim")))
#This is what we end up with
data |> count(religion)## # A tibble: 9 × 2
## religion n
## <fct> <int>
## 1 Catholic 11198
## 2 Orthodox Christian 4016
## 3 Protestant 3031
## 4 Other Christian 1183
## 5 Other 1332
## 6 Muslim 392
## 7 Non-believers 5803
## 8 Refusal (SPONTANEOUS) 313
## 9 <NA> 170
Recoding missing values
In this section we will convert all levels of factors that are redundant to NAs.
First we will substitute de “Refusal” answers in the remaining
variables as NAs. If the respondent refuses to answer it’s equivalent to
not knowing his or her answer. We exclude friends_trans
because for that category it could be worth analyzing refusals.
factor_variables <- names(data)[sapply(data, is.factor)]
data <- data %>%
mutate(across(
all_of(setdiff(factor_variables, "friends_trans")), # Exclude "friends_trans"
~ {
# Look for levels that contain "REFUSAL"
refusal_levels <- grep("REFUSAL", levels(.), value = TRUE, ignore.case = TRUE)
# If there are levels containing "REFUSAL"
if (length(refusal_levels) > 0) {
# Convert in NA
fct_recode(., NULL = refusal_levels)
}
# if not remain without changes
else {
.
}
}
))Mutate NONE level of social_class to NA.
We also turn marital_status’s level OTHER into NAs, as
it is difficult to give it any other meaning. Same for social class
data <- data %>%
mutate(social_class = fct_recode(social_class, NULL = "None (SPONTANEOUS)"))
data <- data %>%
mutate(marital_status = fct_recode(marital_status, NULL = "Other (SPONT.)"),
social_class = fct_recode(social_class, NULL = "Other (SPONTANEOUS)"))The total number of missing observations is quite low (4%) so missing data should not be a huge problem.
Given the high percentage of missing values we delete
disc_where. This variable was expected to have a high
percentage of missingness as it is a question that applies only to
people that have been subject to discrimination. So it is not a variable
of huge importance.
The variable with the second highest number of missing data is
left_right, given the importance of this variable it is
probably worth to impute those values.
Next there are the variables trans_docs and
gender_docs which are respectively our target variable and
a closely related variable
Then there is disc_contact which will be deleted as
well, given that it does not add meaningful information to our
analysis.
Exploratory Data Analysis
Plotting the distribution of the numeric variables
# Identify numeric variables
numeric_vars <- names(data)[sapply(data, is.numeric)]
numeric_vars <- numeric_vars[numeric_vars != "serialid"]
# Creating histogramas y calculating skweness
for (var in numeric_vars) {
p <- ggplot(data, aes(x = .data[[var]])) +
geom_histogram(binwidth = 1, fill = "blue", color = "black") +
labs(title = paste("Histogram of", var), x = var, y = "Count") +
theme_minimal()
print(p)}Analysis of individual-level variables
We must keep in mind our target variable is qc19 “Do you
think that transgender persons should be able to change their civil
documents to match their inner gender identity?”
data_percent <- data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
group_by(trans_docs) |>
summarise(count = n()) |>
mutate(percentage = count / sum(count) * 100)
ggplot(data_percent, aes(x = trans_docs, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity") +
geom_text(aes(label = sprintf("%.1f%%", percentage)),
vjust = -0.5, size = 4, color = "black") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(
title = "Support for the right of trans people to change their gender in civil documents",
x = "Support for Transgender Rights",
y = "Percentage"
) +
theme_minimal() +
theme(legend.position = "none")Taking a look to our variable of interest alone, we see how the 52.7% of our sample are in favor of trans people to change their gender in their civil documents. However, there is also a significant opposition (35,3%), and a 12% who answered “Don’t know”. We will try to explore this distribution along the variables we consider to be most important for our analysis.
Sociodemographic variables
Gender
data_summary <- data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
count(trans_docs, gender, name = "n") |>
group_by(gender) |>
mutate(percentage = n / sum(n))
ggplot(data_summary, aes(x = gender, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_stack(vjust = 0.5), size = 4) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
labs(
title = "Support for the right of trans people to change their gender in civil documents",
x = "Gender",
y = "Proportion of Responses",
fill = "Support the right"
) +
coord_flip() +
theme_minimal()+
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)# Crear una tabla de contingencia
contingency_table <- table(data$gender, data$trans_docs)
# Realizar la prueba de chi-cuadrado
chisq_test <- chisq.test(contingency_table)
print(chisq_test)##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 92.715, df = 1, p-value < 2.2e-16
Men exhibit a lower percentage of favorable or don’t knoe responses and a higher rate of rejection compared to women. There is a statistically significant association between gender and our target variable, being women more supportive
Age
data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
mutate(age_bin = cut(age, breaks = seq(min(age, na.rm = TRUE), max(age, na.rm = TRUE), by = 10), include.lowest = TRUE)) |>
# There are 4 observations above 95 y.o. that do not fall within any of the previously difined bins. Given their low number we drop them
drop_na(age_bin) |>
count(age_bin, trans_docs) |>
group_by(age_bin) |>
mutate(percentage = n / sum(n) * 100) |>
ggplot(aes(x = age_bin, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() +
labs(x = "Age Bins", y = "Proportion of responses", fill = "Support the right",
title = "Support for the right of trans people to change their gender in civil documents") +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
theme_minimal() +
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Younger people seem more likely to support the right for transgender people to have their gender changed on official documents.
We also notice a big increase of NAs for older people, meaning older people are less likely to respond to this question (because of unfamiliarity with the topic maybe).
So how would that distribution look if we get rid of NAs
data |>
drop_na(trans_docs) |>
mutate(age_bin = cut(age, breaks = seq(min(age, na.rm = TRUE), max(age, na.rm = TRUE), by = 10), include.lowest = TRUE)) |>
# There are 4 observations above 95 y.o. that do not fall within any of the previously defined bins. Given their low number we drop them
drop_na(age_bin) |>
count(age_bin, trans_docs) |>
group_by(age_bin) |>
mutate(percentage = n / sum(n) * 100) |>
ggplot(aes(x = age_bin, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() +
labs(x = "Age Bins", y = "Proportion of responses", fill = "Support the right",
title = "Support for the right of trans people to change their gender in civil documents") +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
theme_minimal() +
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)We see that differences among age groups are not so evident anymore.
Religiosity
We have two variables related to religiosity: religion
(the religious affiliation professed by the respondent) and
religious_minority (whether or not the respondent belongs
to a religious minority group).
For the first one it’s better to show a cross table instead of a plot, since there are a lot of categories.
CrossTable(data$religion, data$trans_docs,
digits = 2,
expected = FALSE,
asresid = TRUE,
chisq = TRUE,
prop.chisq = FALSE,
format = "SPSS")##
## Cell Contents
## |-------------------------|
## | Count |
## | Row Percent |
## | Column Percent |
## | Total Percent |
## | Adj Std Resid |
## |-------------------------|
##
## Total Observations in Table: 23792
##
## | data$trans_docs
## data$religion | Yes | No | Row Total |
## -------------------|-----------|-----------|-----------|
## Catholic | 5605 | 4241 | 9846 |
## | 56.93% | 43.07% | 41.38% |
## | 39.27% | 44.55% | |
## | 23.56% | 17.83% | |
## | -8.09 | 8.09 | |
## -------------------|-----------|-----------|-----------|
## Orthodox Christian | 1421 | 1959 | 3380 |
## | 42.04% | 57.96% | 14.21% |
## | 9.96% | 20.58% | |
## | 5.97% | 8.23% | |
## | -22.99 | 22.99 | |
## -------------------|-----------|-----------|-----------|
## Protestant | 1866 | 807 | 2673 |
## | 69.81% | 30.19% | 11.23% |
## | 13.07% | 8.48% | |
## | 7.84% | 3.39% | |
## | 11.00 | -11.00 | |
## -------------------|-----------|-----------|-----------|
## Other Christian | 597 | 449 | 1046 |
## | 57.07% | 42.93% | 4.40% |
## | 4.18% | 4.72% | |
## | 2.51% | 1.89% | |
## | -1.97 | 1.97 | |
## -------------------|-----------|-----------|-----------|
## Other | 764 | 446 | 1210 |
## | 63.14% | 36.86% | 5.09% |
## | 5.35% | 4.68% | |
## | 3.21% | 1.87% | |
## | 2.30 | -2.30 | |
## -------------------|-----------|-----------|-----------|
## Muslim | 159 | 160 | 319 |
## | 49.84% | 50.16% | 1.34% |
## | 1.11% | 1.68% | |
## | 0.67% | 0.67% | |
## | -3.72 | 3.72 | |
## -------------------|-----------|-----------|-----------|
## Non-believers | 3860 | 1458 | 5318 |
## | 72.58% | 27.42% | 22.35% |
## | 27.05% | 15.32% | |
## | 16.22% | 6.13% | |
## | 21.28 | -21.28 | |
## -------------------|-----------|-----------|-----------|
## Column Total | 14272 | 9520 | 23792 |
## | 59.99% | 40.01% | |
## -------------------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 973.2956 d.f. = 6 p = 5.330102e-207
##
##
##
## Minimum expected frequency: 127.6429
The extremely small p-value confirms that the relationship between
religion and trans_docs is highly
statistically significant.
data_summary <- data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
mutate(religious_minority = fct_recode(religious_minority, "Yes" = "1", "No" = "0")) |>
count(trans_docs, religious_minority, name = "n") |>
group_by(religious_minority) |>
mutate(percentage = n / sum(n))
ggplot(data_summary, aes(x = religious_minority, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_stack(vjust = 0.5), size = 4) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
labs(
title = "Support for the right of trans people to change their gender in civil documents",
x = "Are you part of a religious minority?",
y = "Proportion of Responses",
fill = "Support the right"
) +
coord_flip() +
theme_minimal()+
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Ideology
data_summary <- data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
count(trans_docs, left_right, name = "n") |>
group_by(left_right) |>
mutate(percentage = n / sum(n))
ggplot(data_summary, aes(x = left_right, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_stack(vjust = 0.5), size = 4) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
labs(
title = "Support for the right of trans people to change their gender in civil documents",
x = "Political affiliation",
y = "Proportion of Responses",
fill = "Support the right"
) +
coord_flip() +
theme_minimal()+
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Stronger left-wing support for trans rights. Right-wing respondents are the most divided, with high opposition levels. Non-responses are highest among those without ideological alignment.
Education
ggplot(data, aes(x = trans_docs, y = years_edu)) +
geom_violin() +
labs(x = "Support for trans people changing their civil documents", y = "Age when stopped full-time education") +
ggtitle("Distribution of support by education")## Warning: Removed 387 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
Support doesn’t seem to be related to education.
Area of residence
data_summary <- data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
count(trans_docs, community, name = "n") |>
group_by(community) |>
mutate(percentage = n / sum(n))
ggplot(data_summary, aes(x = community, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_stack(vjust = 0.5), size = 4) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
labs(
title = "Support for the right of trans people to change their \ngender in civil documents",
x = "Place of residence",
y = "Proportion of Responses",
fill = "Support the right"
) +
coord_flip() +
theme_minimal()+
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Support does not vary much by area of residence.
Other individual-level variables
Having trans friends
data_summary <- data |>
mutate(trans_docs = fct_na_value_to_level(trans_docs, "Don't Know")) |>
count(trans_docs, friends_trans, name = "n") |>
group_by(friends_trans) |>
mutate(percentage = n / sum(n))
ggplot(data_summary, aes(x = friends_trans, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_stack(vjust = 0.5), size = 4) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
labs(
title = "Support for the right of trans people to change their \ngender in civil documents",
x = "Do you know a transgender person?",
y = "Proportion of Responses",
fill = "Support the right"
) +
coord_flip() +
theme_minimal()+
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Having trans friends is associated with a higher support, while those who refuse to disclose their connections to trans people tend to oppose more.
Country-level EDA
country_support <- data %>%
mutate(
trans_docs = case_when(
is.na(trans_docs) ~ "DK",
trans_docs == "Yes" ~ "1",
trans_docs == "No" ~ "2",
TRUE ~ as.character(trans_docs)
)
) %>%
group_by(country, trans_docs = trans_docs) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(country) %>%
mutate(proportion = (count / sum(count))*100) %>%
ungroup()trans_docs_prop <- country_support %>%
filter(trans_docs == "1") %>%
select(country, proportion) %>%
rename(trans_support = proportion)
country_long <- country_level_data %>%
inner_join(trans_docs_prop, by = c("iso2c" = "country")) |>
pivot_longer(cols = c(gdp_pc_ppp, gender_inequality_index, lgbt_policy_index, democracy_index),
names_to = "variable", values_to = "value")
ggplot(country_long, aes(x = value, y = trans_support)) +
geom_smooth(method = "lm", se = FALSE, color = "lightgray") + # we can remove this
geom_point(size = 2, color = "blue", alpha = 0.7) +
geom_text(aes(label = iso3c), vjust = -0.5, hjust = 0.5, size = 3) +
facet_wrap(~variable, scales = "free_x") +
labs(
title = "Relationship between country-level variables and trans support",
x = "Country-level variable",
y = "Proportion of Yes answers"
) +
theme_minimal()## `geom_smooth()` using formula = 'y ~ x'
Democracy and LGBT Policy indeces seem to have a positive linear relationship with the target variable, while Gender Inequality index has a negative linear relationship. The relationship with GDP per capita is non-linear (logarithmic?). There appears to be a “ceiling” of support, beyond which increases in GDP per capita lose effect.
Boxplots for country-level variables:
ggplot(country_long, aes(y = value, x = variable)) +
geom_boxplot(outlier.color = "red", outlier.shape = 16, fill = "lightblue") +
coord_flip() +
theme_minimal() +
labs(title = "Distribution of country-level variables", x = NULL, y = "value") +
facet_wrap(~variable, scales = "free") +
theme(axis.text.y = element_blank(),
axis.ticks.x = element_blank())country_level_data_cor <- country_level_data %>%
inner_join(trans_docs_prop, by = c("iso2c" = "country"))
cor_matrix <- cor(country_level_data_cor %>%
select(gdp_pc_ppp, democracy_index, gender_inequality_index,
lgbt_policy_index, trans_support),
use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))Some heavy correlations which is a little worrying, but as we are going to use these data to build cross-level interactions, this should not compromise the reliability of our analysis too much.
Paradata
# Convert into factor
paradata <- paradata %>%
mutate(across(where(~ !is.null(attr(., "labels"))), labelled::to_factor))
# Merging datasets for the analysis
merged_paradata <- merge(data, paradata, by = "serialid", all.x = TRUE)Number of persons present during interview
library(gmodels)
# Convert NA in a explicit category
merged_paradata$trans_docs <- as.character(merged_paradata$trans_docs)
merged_paradata$trans_docs[is.na(merged_paradata$trans_docs)] <- "DK"
merged_paradata$trans_docs <- as.factor(merged_paradata$trans_docs)
# CrossTable
CrossTable(merged_paradata$p4, merged_paradata$trans_docs,
digits=2,
expected=F,
asresid=T,
chisq=TRUE,
prop.chisq=F,
format="SPSS")##
## Cell Contents
## |-------------------------|
## | Count |
## | Row Percent |
## | Column Percent |
## | Total Percent |
## | Adj Std Resid |
## |-------------------------|
##
## Total Observations in Table: 27438
##
## | merged_paradata$trans_docs
## merged_paradata$p4 | DK | No | Yes | Row Total |
## ---------------------------------|-----------|-----------|-----------|-----------|
## Two (interviewer and respondent) | 2806 | 8146 | 12428 | 23380 |
## | 12.00% | 34.84% | 53.16% | 85.21% |
## | 85.55% | 84.02% | 85.93% | |
## | 10.23% | 29.69% | 45.29% | |
## | 0.58 | -4.10 | 3.54 | |
## ---------------------------------|-----------|-----------|-----------|-----------|
## Three | 428 | 1316 | 1758 | 3502 |
## | 12.22% | 37.58% | 50.20% | 12.76% |
## | 13.05% | 13.57% | 12.16% | |
## | 1.56% | 4.80% | 6.41% | |
## | 0.52 | 2.97 | -3.19 | |
## ---------------------------------|-----------|-----------|-----------|-----------|
## Four | 36 | 189 | 218 | 443 |
## | 8.13% | 42.66% | 49.21% | 1.61% |
## | 1.10% | 1.95% | 1.51% | |
## | 0.13% | 0.69% | 0.79% | |
## | -2.50 | 3.25 | -1.49 | |
## ---------------------------------|-----------|-----------|-----------|-----------|
## Five or more | 10 | 44 | 59 | 113 |
## | 8.85% | 38.94% | 52.21% | 0.41% |
## | 0.30% | 0.45% | 0.41% | |
## | 0.04% | 0.16% | 0.22% | |
## | -1.02 | 0.80 | -0.11 | |
## ---------------------------------|-----------|-----------|-----------|-----------|
## Column Total | 3280 | 9695 | 14463 | 27438 |
## | 11.95% | 35.33% | 52.71% | |
## ---------------------------------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 26.44717 d.f. = 6 p = 0.0001837381
##
##
##
## Minimum expected frequency: 13.50827
The table shows a significant association between the number of people present during the interview and support for transgender people to change their documents
Duration of interview
Regarding the duration of the interview we have 2 options
p3 and p3r. We will use p3r to
have a simpler analysis
# CrossTable
CrossTable(merged_paradata$p3r, merged_paradata$trans_docs,
digits=2,
expected=F,
asresid=T,
chisq=TRUE,
prop.chisq=F,
format="SPSS")##
## Cell Contents
## |-------------------------|
## | Count |
## | Row Percent |
## | Column Percent |
## | Total Percent |
## | Adj Std Resid |
## |-------------------------|
##
## Total Observations in Table: 27438
##
## | merged_paradata$trans_docs
## merged_paradata$p3r | DK | No | Yes | Row Total |
## --------------------|-----------|-----------|-----------|-----------|
## Up to 14 minutes | 73 | 224 | 260 | 557 |
## | 13.11% | 40.22% | 46.68% | 2.03% |
## | 2.23% | 2.31% | 1.80% | |
## | 0.27% | 0.82% | 0.95% | |
## | 0.85 | 2.43 | -2.88 | |
## --------------------|-----------|-----------|-----------|-----------|
## 15 - 29 minutes | 782 | 2284 | 3456 | 6522 |
## | 11.99% | 35.02% | 52.99% | 23.77% |
## | 23.84% | 23.56% | 23.90% | |
## | 2.85% | 8.32% | 12.60% | |
## | 0.10 | -0.61 | 0.52 | |
## --------------------|-----------|-----------|-----------|-----------|
## 30 - 44 minutes | 1429 | 4388 | 6305 | 12122 |
## | 11.79% | 36.20% | 52.01% | 44.18% |
## | 43.57% | 45.26% | 43.59% | |
## | 5.21% | 15.99% | 22.98% | |
## | -0.75 | 2.66 | -2.06 | |
## --------------------|-----------|-----------|-----------|-----------|
## 45 - 59 minutes | 703 | 1993 | 3039 | 5735 |
## | 12.26% | 34.75% | 52.99% | 20.90% |
## | 21.43% | 20.56% | 21.01% | |
## | 2.56% | 7.26% | 11.08% | |
## | 0.80 | -1.04 | 0.48 | |
## --------------------|-----------|-----------|-----------|-----------|
## 60 - 74 minutes | 202 | 529 | 908 | 1639 |
## | 12.32% | 32.28% | 55.40% | 5.97% |
## | 6.16% | 5.46% | 6.28% | |
## | 0.74% | 1.93% | 3.31% | |
## | 0.48 | -2.67 | 2.25 | |
## --------------------|-----------|-----------|-----------|-----------|
## 75 - 89 minutes | 49 | 123 | 276 | 448 |
## | 10.94% | 27.46% | 61.61% | 1.63% |
## | 1.49% | 1.27% | 1.91% | |
## | 0.18% | 0.45% | 1.01% | |
## | -0.67 | -3.52 | 3.80 | |
## --------------------|-----------|-----------|-----------|-----------|
## 90 minutes and more | 42 | 154 | 219 | 415 |
## | 10.12% | 37.11% | 52.77% | 1.51% |
## | 1.28% | 1.59% | 1.51% | |
## | 0.15% | 0.56% | 0.80% | |
## | -1.16 | 0.76 | 0.02 | |
## --------------------|-----------|-----------|-----------|-----------|
## Column Total | 3280 | 9695 | 14463 | 27438 |
## | 11.95% | 35.33% | 52.71% | |
## --------------------|-----------|-----------|-----------|-----------|
##
##
## Statistics for All Table Factors
##
##
## Pearson's Chi-squared test
## ------------------------------------------------------------
## Chi^2 = 36.84302 d.f. = 12 p = 0.0002368813
##
##
##
## Minimum expected frequency: 49.61003
Respondent cooperation
data_summary <- merged_paradata |>
mutate(trans_docs = fct_recode(trans_docs, "Don't Know" = "DK"))|>
count(p5, trans_docs, name = "n") |>
group_by(p5) |>
mutate(percentage = n / sum(n))
ggplot(data_summary, aes(x = p5, y = percentage, fill = trans_docs)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_stack(vjust = 0.5), size = 4) +
scale_y_continuous(labels = percent_format()) +
scale_fill_manual(
values = c("Yes" = "#2ECC71", "No" = "#E74C3C", "Don't Know" = "#3498DB")
) +
labs(
title = "Support for the right of trans people to change their gender in civil documents",
x = "Respondent cooperation in the interview",
y = "Proportion of Responses",
fill = "Support the right"
) +
coord_flip() +
theme_minimal()+
theme(
panel.grid = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Respondents who engaged better in the survey were more likely to express support.
Correlation and multicollinearity
Now we want to take a look at the possible multicollinearity of our explanatory variables. For it, we calculate the correlation of our numerical variables. As we have too many variables, we visualize just those with a strong correlation to focus on key dependencies.
cor_matrix <- cor(data %>%
select_if(is.numeric) %>%
select(-serialid),
use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))# Create a matrix where only correlations >= 0.7 are maintained
cor_filtered <- ifelse(abs(cor_matrix) >= 0.7, cor_matrix, NA)
# Graph
corrplot(cor_filtered, method = "color", type = "upper",
tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200),
na.label = " ",
insig = "blank") Most strong correlations are among variables related to race, religion or sexual orientation discrimination. Consequently, we are grouping them into 3 new composite variables by averaging related ones, in order to reduce multicollinearity.
# Creating grouped variables
data <- data %>%
mutate(
racial_discri = rowMeans(select(., roma_discri, black_discri, asian_discri), na.rm = TRUE), #no white
sexual_discri = rowMeans(select(., lgb_discri, trans_discri, intersex_discri), na.rm = TRUE),
religious_discri = rowMeans(select(., jewish_discri, muslim_discri, buddihst_discri), na.rm = TRUE)
)
# Removing original variables to avoid redundancy
data <- data |>
select(-c(roma_discri, black_discri, asian_discri,
lgb_discri, trans_discri, intersex_discri,
jewish_discri, muslim_discri, buddihst_discri))
# Graphical representation
cor_matrix <- cor(data %>%
select_if(is.numeric) %>%
select(-serialid),
use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))We realize our new variables are still highly correlated between them (bottom-right corner). For reducing the dimensionality of the dataset even more while preserving information we create one single variable for general minority discrimination.
# Creating grouped variable
data <- data %>%
mutate(
minority_discri = rowMeans(select(., racial_discri, sexual_discri,
religious_discri, disability_discri), na.rm = TRUE)) We decided to exclude white_discri,
atheist_discri, christian_discri because we
realized they had a median of 1 and a quite small mean, meaning there is
almost no reported discrimination against these groups in the
dataset.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.000 1.000 1.785 2.000 10.000 403
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.000 1.500 2.802 4.000 10.000 517
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.000 1.000 1.904 2.000 10.000 414
data <- data |>
select(-c(white_discri, atheist_discri, christian_discri, racial_discri,
sexual_discri, religious_discri, disability_discri))Final check of the correlation matrix to confirm that multicollinearity has been reduced.
cor_matrix <- cor(data %>%
select_if(is.numeric) %>%
select(-serialid),
use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper",
addCoef.col = "black", tl.col = "black",
col = colorRampPalette(c("blue", "white", "red"))(200))Class imbalance
##
## Yes No <NA>
## 0.5271157 0.3533421 0.1195422
##
## Yes No
## 0.5986837 0.4013163
Our target variable has a 60/40 distribution so we are not worried about class imbalance.
Less represented classes in other variables have already been aggregated if deemed necessary.
Analysis of missing values
Here we analyse the NA reponses to our target variable. This is a sort of robustness check to understand our data and learn if there are patterns in NA response that we should be worried about e.g. if there are specific demographics of people who are less likely to respond.
Overall the analysis below shows that non response is similar to the
descriptive data and aligned with some groups reported “No” to
supporting trans_docs in qc19. So we’re more at risk of
underreporting “no” votes in this survey due to non-response. The
correlation is low though so it’s not a major concern.
Who is more likely to not respond to our target?
There are 3,280 NA responses to the target variable. To try understand if the missing values are at random, we will test the correlation between NA response and Use the DK (don’t know) for our target variable.
In the descriptive analysis above, we have already seen in raw terms that NA responses were more frequent from some groups e.g.
women,
older people,
people who also responded NA for political ideology,
People who do not have trans friends and those who refused to respond to the friendship question had higher NA responses to the target variable.
People with lower survey cooperation
By country, we already understand from the challenge description that the NA responses vary. This is a significant disparity.
Create a binary variable for the DK
cntry_name <- codelist |> select(iso2c, country_name = country.name.en)
dk_target <- data |>
mutate(target_NA = ifelse(is.na(trans_docs), 1, 0)) |>
left_join(cntry_name, by = join_by(country == iso2c)) |>
dplyr::select(-trans_docs, -serialid, -country)
table(dk_target$target_NA)##
## 0 1
## 24158 3280
Confirm DK rates by country. They vary from 1.4% in Belgium to 28.5% in Bulgaria. Overall, the average is 11.95%.
dk_target |>
group_by(country_name) |>
summarise(count_na = sum(target_NA),
num_resp = length(country_name),
pct_na = count_na/num_resp*100) |>
ggplot(aes(x=reorder(country_name, -pct_na), y = pct_na))+
geom_col()+
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank()) +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(y = "Proportion of NA responses")If all NA responses to the target are removed, this is how much the “Yes” count increases for each of our countries. We see this is not proportional. This is a statistical phenomenon. distribution of the variable looks by country. The NA vote removal, makes our differences in Yes/No differences appear larger.
data |>
group_by(country) |>
summarise(yes_count = sum(trans_docs == "Yes", na.rm=TRUE),
num_resp = sum(!is.na(trans_docs)),
num_na = sum(is.na(trans_docs)),
pct_yes_withna = yes_count / length(country)*100,
pct_yes = yes_count/num_resp*100)|>
ggplot(aes(x = reorder(country, -pct_yes))) +
geom_col(aes(y = pct_yes), fill = "red") + # First bar (pct_yes)
geom_col(aes(y = pct_yes_withna), fill = "yellow", alpha = 0.5) + # Second bar (pct_yes_withna)
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank()) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0,100)) +
labs(title ="Support for transgender rights to legally change documents, with and without NA's")Check correlation with DK of target
Run model to test what is correlated with DK response, use Cramers V for association between the factor variables and correlation for the numeric variables in the cleaned data.
- Model the data for DK responses against the factor variables.
factor_subset <- dk_target %>%
dplyr::select(target_NA, where(is.factor)) |>
mutate(target_NA = factor(target_NA))
# Run cramers V for factors
target_var <- "target_NA"
factor_variables <- names(factor_subset)
factor_variables <- factor_variables[factor_variables != target_var]
cramers_v_results <- list()
for (variable in factor_variables) {
contingency_table <- table(factor_subset[[target_var]],
factor_subset[[variable]])
cramers_v <- CramerV(contingency_table)
cramers_v_results[[variable]] <- cramers_v
}
# Create a Tibble for Results
cat_results <- tibble(variable = names(cramers_v_results),
cramers_v = unlist(cramers_v_results)) |>
arrange(desc(cramers_v))
cat_results## # A tibble: 22 × 2
## variable cramers_v
## <chr> <dbl>
## 1 internet_use 0.136
## 2 social_class 0.0975
## 3 occupation 0.0852
## 4 marital_status 0.0831
## 5 gender_docs 0.0820
## 6 religion 0.0763
## 7 phone_access 0.0711
## 8 life_sat 0.0642
## 9 polintr 0.0621
## 10 friends_trans 0.0581
## # ℹ 12 more rows
This shows all of the factors have quite low association with the NA responses. The highest being internet use. This is a good sign that the missing values are random.
- Look at the numeric variables and test correlations:
numeric_subset <- dk_target %>%
dplyr::select(target_NA, where(is.numeric))
target_var <- "target_NA"
numeric_variables <- names(numeric_subset)[names(numeric_subset) != target_var]
cor_results <- list()
for (variable in numeric_variables) {
correlation <- cor(numeric_subset$target_NA,
numeric_subset[[variable]],
use = "pairwise.complete.obs")
cor_results[[variable]] <- correlation
}
# Create a Tibble for Results
cor_results <- tibble(variable = names(cor_results),
cor = unlist(cor_results)) |>
arrange(cor)
cor_results## # A tibble: 7 × 2
## variable cor
## <chr> <dbl>
## 1 n_actions_against_discri -0.0988
## 2 n_friends_minorities -0.0965
## 3 years_edu -0.0544
## 4 social_alienation 0.0566
## 5 minority_discri 0.0635
## 6 age 0.0982
## 7 antilgbtq_rights 0.108
Again, this is not too much cause for concern. We only have around 10% correlations, positive and negative with the target variable.
Modelling relationship with key variables and non-response
Here we run a logistic regression to test for the most important/significant variables. First, we make a subset of data with only our variables that were most correlated in the seciton above (use above +/- 9.5% correlation and above 10% cramers V) and also include gender and country.
## [1] "age" "antilgbtq_rights"
## [1] "internet_use"
Now model with those variables to see if they are significant in explaining the DK values. Gender will also be included as a key variable.
We run a stepwise AIC on these most correlated variables and then use the best model to test the overall model fit.
# first, run stepwise to determine most important
# run base model
dk_fit_null <- glm(target_NA ~ 1,
data = dk_target,
family = "binomial")
# run full fit model of most correlated vars with country (also age^2 for good practice)
dk_fit <- glm(target_NA ~ gender + age + I(age*age) + antilgbtq_rights + internet_use + social_class + country_name,
data = dk_target,
family = "binomial")
# use stepwise to select best mode l
aic_1 <- MASS::stepAIC(dk_fit, scope = list(upper = dk_fit,
lower = dk_fit_null),
direction = "both", k = 2, trace=0) # forward based on AIC
# filter vars significant at 5% and calculate exponential
broom::tidy(aic_1) |>
filter(p.value<0.05) |>
mutate(oddsratio = exp(estimate)) |>
select(term, oddsratio, p.value)## # A tibble: 29 × 3
## term oddsratio p.value
## <chr> <dbl> <dbl>
## 1 (Intercept) 0.0500 8.78e-40
## 2 genderWoman 1.20 2.01e- 5
## 3 I(age * age) 1.00 4.77e- 4
## 4 antilgbtq_rights 1.13 9.69e- 7
## 5 internet_useTwo or three times a week 1.21 2.72e- 2
## 6 internet_useTwo or three times a month 0.307 4.97e- 3
## 7 internet_useNever/No access 1.28 2.49e- 4
## 8 internet_useNo Internet access at all 1.57 4.62e- 4
## 9 social_classThe lower middle class of society 0.766 4.62e- 5
## 10 social_classThe middle class of society 0.713 4.09e-11
## # ℹ 19 more rows
The best model is the one without age but includes age^2. So we run the full model to keep the lower level variables included.
The model has some interesting findings to who did not respond to our findings. Some things we can see with the odds ratios (excluding interpretation of countries as that is covered above):
women are around 20% more likely to give NA responses
Anti-lgbti rights people were about 10% more likely to not respond
Internet use was significant but is not interpertable. If you had not used the internet (higher NA likelihood) or used it 2-3 times a week (lower NA likelihood) compared to everyday use
Social class was significant at all levels, compared to the working class, you were more likely to not respond to our target variable if you reported being in the lower middle, middle or upper middle class. I.e. working class were less likely to answer this question.
Below is an option using undersampling (to reduce class imbalance of being a NA versus not being a NA for our target variable).
na_model <- glm(target_NA ~ .,
data = dk_target,
family = "binomial")
broom::tidy(na_model) |> arrange(desc(estimate))## # A tibble: 89 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 country_nameBulgaria 1.26 0.346 3.63 0.000280
## 2 country_nameSweden 0.822 0.320 2.56 0.0103
## 3 religionMuslim 0.692 0.325 2.13 0.0335
## 4 country_nameLatvia 0.687 0.321 2.14 0.0323
## 5 country_nameCyprus 0.619 0.450 1.38 0.169
## 6 country_nameGermany 0.552 0.303 1.82 0.0683
## 7 country_nameUnited Kingdom 0.545 0.329 1.66 0.0977
## 8 gender_docsNo 0.536 0.0930 5.77 0.00000000796
## 9 country_nameFinland 0.507 0.328 1.54 0.123
## 10 country_namePoland 0.501 0.326 1.54 0.124
## # ℹ 79 more rows
Approach to test relationship to DK with splitting and undersampling:
# create new dataset with just vars we want, including age^2
dk_target2 <- dk_target |>
select(target_NA, age, antilgbtq_rights, internet_use, social_class, country_name) |>
mutate(age_sq = age*age, .after = age)
# to deal with some class imbalance, undersample from the majority group (NA = 0)
set.seed(123)
# Generate an index
index <- createDataPartition(dk_target2$target_NA, p = 0.7, list = FALSE, times = 1)
# Subset the dataframe
train <- dk_target2[index, ]
test <- dk_target2[-index, ]
# check splits
prop.table(table(train$target_NA))##
## 0 1
## 0.8787421 0.1212579
##
## 0 1
## 0.8844612 0.1155388
Now undersample the majority
set.seed(123)
under <- ovun.sample(target_NA~.,
data=train,
method = "under",
N = 4000)$data
# limit the sample to 4000 obs as we have ~2300 for target minority class.
table(under$target_NA)##
## 0 1
## 2112 1888
##
## Call:
## randomForest(formula = target_NA ~ ., data = under)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 2
##
## Mean of squared residuals: 0.2389922
## % Var explained: 4.1
We see that only around 5% of the variance is explained.
Testing the NA values against paradata
Here we extend the model to see if paradata is related to changed non-response.
para <- paradata |>
bind_cols(trans_docs = data$trans_docs,
age = data$age,
gender = data$gender,
country = data$country) |>
mutate(target_NA = ifelse(is.na(trans_docs), 1, 0)) |>
select(-serialid, -trans_docs)
summary(para$target_NA)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1195 0.0000 1.0000
## tibble [27,438 × 9] (S3: tbl_df/tbl/data.frame)
## $ p2 : Factor w/ 6 levels "Before 8 h","8 - 12 h",..: 4 4 3 4 2 4 5 2 4 4 ...
## ..- attr(*, "label")= chr "TIME OF INTERVIEW"
## $ p3 : Factor w/ 246 levels "2 minutes","3",..: 44 39 37 30 30 25 26 33 39 39 ...
## ..- attr(*, "label")= chr "DURATION OF INTERVIEW"
## $ p3r : Factor w/ 7 levels "Up to 14 minutes",..: 4 3 3 3 3 2 2 3 3 3 ...
## ..- attr(*, "label")= chr "DURATION OF INTERVIEW (RECODED)"
## $ p4 : Factor w/ 5 levels "Two (interviewer and respondent)",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "N OF PERSONS PRESENT DURING INTERVIEW"
## $ p5 : Factor w/ 5 levels "Excellent","Fair",..: 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "RESPONDENT COOPERATION"
## $ age : num [1:27438] 51 62 38 29 63 41 48 88 44 45 ...
## $ gender : Factor w/ 2 levels "Man","Woman": 1 2 2 1 2 2 1 1 2 2 ...
## $ country : chr [1:27438] "BE" "BE" "BE" "BE" ...
## $ target_NA: num [1:27438] 0 0 0 0 0 0 0 0 0 0 ...
First, again test correlation. We use Cramers V since we only have factors:
target_var <- "target_NA"
para_vars <- names(para)[names(para) != target_var]
cramers_v_results <- list()
for (variable in para_vars) {
contingency_table <- table(para[[target_var]],
para[[variable]])
contingency_table_no_dk <- contingency_table[, colnames(contingency_table) != "DK"]
cramers_v <- CramerV(contingency_table_no_dk)
# include this for no DK as it is all zero which creates errors
cramers_v_results[[variable]] <- cramers_v
}
# Create a Tibble for Results
tibble(variable = names(cramers_v_results),
cramers_v = unlist(cramers_v_results)) |>
arrange(desc(cramers_v))## # A tibble: 8 × 2
## variable cramers_v
## <chr> <dbl>
## 1 country 0.168
## 2 age 0.123
## 3 p5 0.106
## 4 p3 0.0916
## 5 gender 0.0292
## 6 p4 0.0165
## 7 p2 0.0116
## 8 p3r 0.0113
Again, we see low correlation with the paradata and cramers V.
Second, we run another logistic model with just Paradata included.
na_para_model <- glm(target_NA ~ p2 + p3r + p4 + p5 + age + I(age*age) + gender + factor(country),
data = para,
family=binomial(link = "logit"))
broom::tidy(na_para_model) |>
filter(p.value<0.05) |>
mutate(oddsratio = exp(estimate)) |>
select(term, oddsratio, p.value)## # A tibble: 25 × 3
## term oddsratio p.value
## <chr> <dbl> <dbl>
## 1 (Intercept) 0.0439 4.65e-12
## 2 p4Four 0.694 4.20e- 2
## 3 p5Fair 1.42 2.05e-15
## 4 p5Average 1.74 8.91e-18
## 5 p5Bad 3.49 2.14e-22
## 6 age 0.983 2.85e- 3
## 7 I(age * age) 1.00 3.43e- 9
## 8 genderWoman 1.14 1.07e- 3
## 9 factor(country)BE 0.179 6.77e- 9
## 10 factor(country)BG 5.30 2.16e-31
## # ℹ 15 more rows
We see that when controlling for age, gender and country, the survey cooperation variable is significant.
When compared to those with an excellent cooperation rating, people were more likely to respond. This increases with lacking cooperation. If cooperation was fair, there was around 40% more chance of NA, if cooperation was average, around 75% higher chance of NA. If cooperation was bad, there was about 350% higher chance of a NA response to our target variable.
Overall, the correlations with our target variables are low which means we don’t have a major concern. Some are interesting e.g. interview cooperation. The worse the cooperation, the more likely to not respond to the question. This could be underreporting No votes, because as we saw in the descriptive analysis, the worse the cooperation, the higher the No vote for supporting trans_docs. There could also be some reverse causality, as not responding leads to a worse cooperation score. But the odds ratios here are very high.
Imputing missing data
As we have different types of variables we decide to go with the default methods. Selecting the best method for each type of variable would be too computationally intensive.
We know that it would be better to impute multiple datasets and then pool the results of different regressions on those different datasets together, but running more than one multilevel regression would be too computationally intensive and so we are not able to do this.
# Excluding target variable and other useless variables
imputation_data <- data |>
select(-c(serialid, country, trans_docs))
# Running it with the default methods
# By default: numerical variables -> pmm, binary factors -> logreg, > 2 levels factors -> polyreg
imputed_data <- mice(imputation_data, m = 1, seed=1234)##
## iter imp variable
## 1 1 years_edu community marital_status social_class religion bill_issues life_sat left_right social_alienation gender_docs friends_trans antilgbtq_rights minority_discri
## 2 1 years_edu community marital_status social_class religion bill_issues life_sat left_right social_alienation gender_docs friends_trans antilgbtq_rights minority_discri
## 3 1 years_edu community marital_status social_class religion bill_issues life_sat left_right social_alienation gender_docs friends_trans antilgbtq_rights minority_discri
## 4 1 years_edu community marital_status social_class religion bill_issues life_sat left_right social_alienation gender_docs friends_trans antilgbtq_rights minority_discri
## 5 1 years_edu community marital_status social_class religion bill_issues life_sat left_right social_alienation gender_docs friends_trans antilgbtq_rights minority_discri
## age gender years_edu
## "" "" "pmm"
## community marital_status occupation
## "polyreg" "polyreg" ""
## social_class religion nonEU_national
## "polyreg" "polyreg" ""
## phone_access bill_issues internet_use
## "" "polyreg" ""
## life_sat polintr left_right
## "polyreg" "" "polyreg"
## social_alienation ethnic_minority skincolor_minority
## "pmm" "" ""
## religious_minority roma_minority sexual_minority
## "" "" ""
## disability_minority suffered_discr gender_docs
## "" "" "logreg"
## friends_trans n_friends_minorities n_actions_against_discri
## "polyreg" "" ""
## antilgbtq_rights minority_discri
## "pmm" "pmm"
## age gender years_edu
## 0 0 387
## community marital_status occupation
## 14 175 0
## social_class religion nonEU_national
## 1021 483 0
## phone_access bill_issues internet_use
## 0 379 0
## life_sat polintr left_right
## 102 0 4689
## social_alienation ethnic_minority skincolor_minority
## 1000 0 0
## religious_minority roma_minority sexual_minority
## 0 0 0
## disability_minority suffered_discr gender_docs
## 0 0 3187
## friends_trans n_friends_minorities n_actions_against_discri
## 968 0 0
## antilgbtq_rights minority_discri
## 791 355
## age gender years_edu
## 0 0 0
## community marital_status occupation
## 0 0 0
## social_class religion nonEU_national
## 0 0 0
## phone_access bill_issues internet_use
## 0 0 0
## life_sat polintr left_right
## 0 0 0
## social_alienation ethnic_minority skincolor_minority
## 0 0 0
## religious_minority roma_minority sexual_minority
## 0 0 0
## disability_minority suffered_discr gender_docs
## 0 0 0
## friends_trans n_friends_minorities n_actions_against_discri
## 0 0 0
## antilgbtq_rights minority_discri
## 0 0
Joining together all our data
country_level_data stores all our data defined at the
country level.
final_data stores all our data (after imputation) for
individual level variables.
Explanatory model
Before running our final explanatory model we run a logistic regression with all our individual level predictors to see which variables are significant.
We also run a stepwise regression and Lasso regression to select the most important subset of predictors.
Comparing the outputs of these 3 regressions can gives us hint about which are the most important variables and levels to include in our final explanatory model.
Simple logistic regression
I am using only individual level data for now. This is a simple model to see at the global level i.e. no country level control, which variables are significant. The numeric data has also been scaled.
datalr <- final_data |>
# Dropping NAs (there should be NAs only in our target variable)
drop_na() |>
# Selecting only predictor variables
select(-"country") |>
# Scale variables
mutate(across(where(is.numeric), scale)) |>
# Transforming target to binary numeric Yes=1, No=0
mutate(trans_docs = as.numeric(trans_docs == "Yes"))
# now run the full model
simple_logistic <- glm(trans_docs ~ ., data = datalr, family = binomial(link = "logit"))
# Save the summary
logistic_summary <- summary(simple_logistic)
# create a tibble to more easily search through significant variables
logistic_results <- tibble(
broom::tidy(simple_logistic) |>
filter(p.value<0.05) |>
mutate(oddsratio = exp(estimate)) |>
select(term, estimate, oddsratio, p.value) |>
arrange(oddsratio))
print(logistic_results, n = Inf)## # A tibble: 26 × 4
## term estimate oddsratio p.value
## <chr> <dbl> <dbl> <dbl>
## 1 gender_docsNo -2.66 0.0697 0
## 2 friends_transRefusal (SPONTANEOUS) -0.798 0.450 2.05e- 5
## 3 antilgbtq_rights -0.699 0.497 1.16e-194
## 4 roma_minority1 -0.568 0.567 1.43e- 4
## 5 internet_useTwo or three times a week -0.418 0.658 4.29e- 8
## 6 life_satNot at all satisfied -0.395 0.673 3.56e- 4
## 7 internet_useNever/No access -0.348 0.706 1.92e- 7
## 8 bill_issuesFrom time to time -0.335 0.716 8.36e- 6
## 9 internet_useAbout once a week -0.332 0.718 2.13e- 2
## 10 occupationRetired (4 in d15a) -0.321 0.725 1.24e- 4
## 11 occupationStudents (2 in d15a) -0.294 0.745 9.72e- 3
## 12 occupationOther white collars (13 or 14 in d15a) -0.263 0.769 1.61e- 3
## 13 minority_discri -0.233 0.792 2.00e- 24
## 14 ethnic_minority1 -0.231 0.794 3.62e- 2
## 15 suffered_discr1 -0.221 0.802 5.75e- 5
## 16 bill_issuesAlmost never/never -0.211 0.810 4.20e- 3
## 17 left_right(7 -10) Right -0.166 0.847 6.88e- 4
## 18 occupationManual workers (15 to 18 in d15a) -0.161 0.851 4.00e- 2
## 19 life_satNot very satisfied -0.154 0.858 2.26e- 2
## 20 religionNon-believers 0.136 1.15 8.95e- 3
## 21 marital_statusSingle (9-10 in d7) 0.138 1.15 2.42e- 2
## 22 n_friends_minorities 0.161 1.17 6.76e- 14
## 23 age 0.209 1.23 4.77e- 10
## 24 genderWoman 0.263 1.30 5.44e- 12
## 25 phone_accessLandline & mobile 0.275 1.32 2.97e- 12
## 26 (Intercept) 2.69 14.8 4.64e- 77
Interpeting results
To interpret the coefficients as odds ratios, anything above 1 indicates they are more likely to support the changes to civil documents for trans people.
The logistic regression shows that 26 statistically significant terms. This includes factors variables with multiple terms. Of our 29 scaled variables, the significant variables (at the 5% level) are:
More likely to OPPPOSE the right to change civil documents for trans people, include:
men (compared to women)
younger people (this is in a way unexpected)
people who oppose legal rights to add a third-gender in official docs (compared to those who do)
people who refused to answer whether they have trans friends (compared to those who do)
people who are more financially sound (have fewer bill issues)
people not satisfied with their life (compared to those who are ‘very satisfied’)
people who identified as roma or an ethnic minority
people who are more discriminatory against minorities
More likely to SUPPORT the the right
older people
women
people who were self-employed (compared to all other occupation types incl students)
unmarried (single) people
non-believers (religious)
people with a landline and mobile
people who use the internet everyday/almost everyday (compared to all other internet use categories)
people who reported being more left wing (compared to right wing)
people who had friends in minority groups.
NUMERIC VARIABLES
Without scaling, we interpret logistic regression output as:
coef(simple_logistic): these coefficients represent the
change in the log-odds for a one-unit increase in the corresponding
independent variable
exp(coef(simple_logistic)): the exponential of the slope
coefficient (exp(B)) tells us the change of the odds if the independent
variable increases by one unit
If the data is scaled:
A coefficient B (from a logistic regression) now reflects the change in the log-odds for a one standard deviation increase in the predictor variable, rather than a one-unit increase in the original scale.
The odds ratio (exp(B)) now tells you how the odds change with a one standard deviation increase in the predictor variable, rather than a one-unit increase.
Here we compare the pre and post transformation of coefficients for interpretation:
## (Intercept) age
## 2.69303178 0.20879798
## genderWoman years_edu
## 0.26342657 -0.01215087
## communitySmall or middle sized town communityLarge town
## 0.01810837 -0.04039684
## (Intercept) age
## 14.7764069 1.2321960
## genderWoman years_edu
## 1.3013817 0.9879227
## communitySmall or middle sized town communityLarge town
## 1.0182733 0.9604082
FACTOR VARIABLES
Factor variables when exponentialised give the likelihood to respond
no/yes relative to the base group in the factor variable. We use the
function factors to view the levels and identify the base
group.
To assist with interpretation of factor outputs:
Here is a function to identify all the base levels in each factor group. Useful to refer to during analysis of different logistic models.
extract_base_groups <- function(df) {
# extract factor vars only and list to print values
factor_vars <- names(final_data)[sapply(final_data, is.factor)]
base_groups <- list()
# loop through values to check all levels
for (var in factor_vars) {
if (length(levels(df[[var]])) > 0) { # Check step - should be TRUE for all selected.
contrast_matrix <- contrasts(df[[var]])
if (is.null(contrast_matrix)) {
base_groups[[var]] <- levels(df[[var]])[1] # check only - shouldn't ever run
} else {
base_group_level <- levels(df[[var]])[rowSums(abs(contrast_matrix)) == 0] # base group is the one with all zeroes, so we extract that
base_groups[[var]] <- base_group_level
}
} else {
base_groups[[var]] <- NA # Indicate non-factor or factor with no levels.
}
}
return(base_groups)
}
# run function with 'final data' and print a tidy output
factor_bases <- extract_base_groups(final_data)
factor_bases <- as_tibble(factor_bases) |>
pivot_longer(cols = everything(),
names_to = "vars",
values_to = "base_group")
print(factor_bases, n=Inf)## # A tibble: 23 × 2
## vars base_group
## <chr> <chr>
## 1 gender Man
## 2 community Rural area or village
## 3 marital_status (Re-)Married (1-4 in d7)
## 4 occupation Self-employed (5 to 9 in d15a)
## 5 social_class The working class of society
## 6 religion Catholic
## 7 nonEU_national 0
## 8 phone_access Mobile only
## 9 bill_issues Most of the time
## 10 internet_use Everyday/Almost everyday
## 11 life_sat Very satisfied
## 12 polintr Strong
## 13 left_right (1 - 4) Left
## 14 ethnic_minority 0
## 15 skincolor_minority 0
## 16 religious_minority 0
## 17 roma_minority 0
## 18 sexual_minority 0
## 19 disability_minority 0
## 20 suffered_discr 0
## 21 gender_docs Yes
## 22 friends_trans Yes
## 23 trans_docs Yes
Model performance - Simple logistic
Here, we test the models ability to classify people as supporting or not supporting the rights.
## [1] 19544.18
#predicted probabilities of being a 1 (i.e. a refusal of possibility of trans doc)
predicted_probs <- simple_logistic$fitted.values
# same by running
# predicted_probs <- predict(simple_logistic, type = "response")
head(predicted_probs)## 1 2 3 4 5 6
## 0.3463520 0.4321645 0.5170989 0.5799662 0.9141149 0.5131253
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Area under the curve: 0.8937
# Turning them to classes using custom threshold
predicted_classes <- ifelse(predicted_probs > 0.5, 1, 0)
head(predicted_classes)## 1 2 3 4 5 6
## 0 0 1 1 1 1
Get the confusion matrix:
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 7608 2087
## 1 2246 12217
##
## Accuracy : 0.8206
## 95% CI : (0.8157, 0.8255)
## No Information Rate : 0.5921
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.6277
##
## Mcnemar's Test P-Value : 0.01638
##
## Sensitivity : 0.7721
## Specificity : 0.8541
## Pos Pred Value : 0.7847
## Neg Pred Value : 0.8447
## Prevalence : 0.4079
## Detection Rate : 0.3149
## Detection Prevalence : 0.4013
## Balanced Accuracy : 0.8131
##
## 'Positive' Class : 0
##
The simple model predicted people’s response with 82% accuracy. That is quite decent for a basic first model. This shows the response is quite explainable with the data we have. The model is slightly better at predicting those who do not support than those who do. Seen by higher specificity than sensitivity (85% vs 77%). But they are both comparable which is good and confirms we don’t have great class imbalance.
Kappa is basically an accurancy score, but taking into account also the possibility of being right by chance. A value of 0.62 is considered quite good.
plot.roc(datalr$trans_docs, predicted_probs,
col="darkblue",
print.auc = TRUE,
auc.polygon=TRUE,
max.auc.polygon=TRUE,
auc.polygon.col="lightblue",
print.thres="best")This suggests changing the threshold for assigning classes to 0.59
Anyway we are not really interested in the prediction power of this model, but we are more interested in understanding which variables are relevant or not
Stepwise regression
Chooses the best simple logistic model based on the lowest AIC achievable
##
## Call: glm(formula = trans_docs ~ age + gender + marital_status + occupation +
## religion + phone_access + bill_issues + internet_use + life_sat +
## left_right + ethnic_minority + roma_minority + suffered_discr +
## gender_docs + friends_trans + n_friends_minorities + n_actions_against_discri +
## antilgbtq_rights + minority_discri, family = binomial(link = "logit"),
## data = datalr)
##
## Coefficients:
## (Intercept)
## 2.62467
## age
## 0.20698
## genderWoman
## 0.26440
## marital_statusSingle living with partner (5-8 in d7)
## 0.04042
## marital_statusSingle (9-10 in d7)
## 0.13987
## marital_statusDivorced or separated (11-12 in d7)
## 0.10517
## marital_statusWidow (13-14 in d7)
## -0.10979
## occupationManagers (10 to 12 in d15a)
## 0.01744
## occupationOther white collars (13 or 14 in d15a)
## -0.25937
## occupationManual workers (15 to 18 in d15a)
## -0.14375
## occupationHouse persons (1 in d15a)
## 0.05889
## occupationUnemployed (3 in d15a)
## -0.12589
## occupationRetired (4 in d15a)
## -0.30665
## occupationStudents (2 in d15a)
## -0.29364
## religionOrthodox Christian
## -0.12432
## religionProtestant
## 0.08546
## religionOther Christian
## -0.13836
## religionOther
## 0.06244
## religionMuslim
## -0.24374
## religionNon-believers
## 0.13104
## phone_accessLandline only
## -0.09735
## phone_accessLandline & mobile
## 0.27311
## phone_accessNo telephone
## -0.19336
## bill_issuesFrom time to time
## -0.34079
## bill_issuesAlmost never/never
## -0.22203
## internet_useTwo or three times a week
## -0.41597
## internet_useAbout once a week
## -0.31328
## internet_useTwo or three times a month
## -0.09201
## internet_useLess often
## 0.02229
## internet_useNever/No access
## -0.33028
## internet_useNo Internet access at all
## -0.02332
## life_satFairly satisfied
## 0.05142
## life_satNot very satisfied
## -0.14512
## life_satNot at all satisfied
## -0.38741
## left_right(5 - 6) Centre
## 0.05631
## left_right(7 -10) Right
## -0.17298
## ethnic_minority1
## -0.26298
## roma_minority1
## -0.58673
## suffered_discr1
## -0.22381
## gender_docsNo
## -2.65983
## friends_transNo
## -0.13252
## friends_transRefusal (SPONTANEOUS)
## -0.79810
## n_friends_minorities
## 0.15912
## n_actions_against_discri
## 0.03980
## antilgbtq_rights
## -0.69753
## minority_discri
## -0.23593
##
## Degrees of Freedom: 24157 Total (i.e. Null); 24112 Residual
## Null Deviance: 32540
## Residual Deviance: 19430 AIC: 19530
The stepwise regression has 19 of the 29 variables included in its best model. It is quite consistent with what our significant variables were above, so we will not provide further commentary. Compared to the full model, this reduced model does not provide a statistically significant improvement in fit. But it does give us a more simple model and shows us what individual level variable may be most important overall.
## Analysis of Deviance Table
##
## Model 1: trans_docs ~ age + gender + marital_status + occupation + religion +
## phone_access + bill_issues + internet_use + life_sat + left_right +
## ethnic_minority + roma_minority + suffered_discr + gender_docs +
## friends_trans + n_friends_minorities + n_actions_against_discri +
## antilgbtq_rights + minority_discri
## Model 2: trans_docs ~ age + gender + years_edu + community + marital_status +
## occupation + social_class + religion + nonEU_national + phone_access +
## bill_issues + internet_use + life_sat + polintr + left_right +
## social_alienation + ethnic_minority + skincolor_minority +
## religious_minority + roma_minority + sexual_minority + disability_minority +
## suffered_discr + gender_docs + friends_trans + n_friends_minorities +
## n_actions_against_discri + antilgbtq_rights + minority_discri
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 24112 19434
## 2 24096 19420 16 14.121 0.5897
Lasso regularization
We still have too many variables for a mixed model. So here we will use Lasso to keep only our most important variables for the mixed models.
Without this step, the glmer stage is too computationally expensive.
lasso_data <- final_data |>
# Doing same preprocessing as done for simple logistic
drop_na() |>
select(-"country") |>
mutate(across(where(is.numeric), scale)) |>
mutate(trans_docs = as.numeric(trans_docs == "Yes"))
# Dummifying all levels to see whether some levels are particularly important
lasso_data <- lasso_data %>%
dummy_cols(select_columns = names(.)[sapply(., is.factor)],
remove_selected_columns = TRUE)
# Dropping one dummy to avoid multicollinearity
# I prefer to do this manually so as to be able choose the baseline category
lasso_data <- lasso_data |>
select(-c(gender_Man, `community_Large town`,
`marital_status_Single (9-10 in d7)`,
`occupation_Manual workers (15 to 18 in d15a)`,
`social_class_The middle class of society`,
religion_Catholic,
`phone_access_Landline & mobile`,
`bill_issues_From time to time`,
`internet_use_Everyday/Almost everyday`,
`life_sat_Fairly satisfied`,
polintr_Medium,
`left_right_(5 - 6) Centre`,
ethnic_minority_0,
skincolor_minority_0,
religious_minority_0,
roma_minority_0,
sexual_minority_0,
disability_minority_0,
suffered_discr_0,
friends_trans_No))
# Convert data into a matrix for glmnet
x_lasso <- lasso_data |> select(-trans_docs) |> as.matrix()
y_lasso <- lasso_data |> select(trans_docs) |> as.matrix()
set.seed(123)
# Define lamdas
lambda_seq <- 10^seq(-2, -5, length.out = 100)
# Fit the Lasso model (alpha = 1 for Lasso regularization)
lasso_model <- cv.glmnet(x_lasso, y_lasso, family = "binomial", alpha = 1, lambda = lambda_seq)
lasso_model$lambda.1se # The largest lambda within one standard error of lambda.min. ## [1] 0.006579332
# This results in a simpler model with fewer selected features.
# Get all the coefficients
coefficients <- coef(lasso_model, s = "lambda.1se")
# Filter for the non-zero ones
non_zero_features <- rownames(coefficients)[which(coefficients != 0)]
non_zero_features## [1] "(Intercept)"
## [2] "age"
## [3] "n_friends_minorities"
## [4] "antilgbtq_rights"
## [5] "minority_discri"
## [6] "gender_Woman"
## [7] "occupation_Managers (10 to 12 in d15a)"
## [8] "religion_Orthodox Christian"
## [9] "religion_Non-believers"
## [10] "phone_access_Mobile only"
## [11] "phone_access_Landline only"
## [12] "phone_access_No telephone"
## [13] "bill_issues_Almost never/never"
## [14] "internet_use_Two or three times a week"
## [15] "internet_use_Never/No access"
## [16] "life_sat_Not very satisfied"
## [17] "life_sat_Not at all satisfied"
## [18] "left_right_(7 -10) Right"
## [19] "ethnic_minority_1"
## [20] "roma_minority_1"
## [21] "suffered_discr_1"
## [22] "gender_docs_Yes"
## [23] "friends_trans_Refusal (SPONTANEOUS)"
Lasso includes variables similar to what was suggested by our initial logistic regression and the subsequent stepwise.
In particular we can see again that:
ageis relevantIt matters whether you have friends that belong to minorities (
n_friends_minorities)Also
antilgbtq_rightsis clearly relevantIt matters whether you are discriminatory against minorities (
minority_disci)genderis relevantIt matters whether you are a manager (at least compared to the baseline of being a manual worker)
It matters whether you are an Orthodox Cristian (at least compared to the baseline being Catholic)
phone_accesshas 2 relevant levels compared with the baseline (having landline and mobile phone access)Not being poor seems to matter (
bill_issues_Almost never/nevercompared to a baseline of having bill issues from time to time)Not using internet seems to matter (compared to accessing it every day)
Not being happy in life seems to matter
Being right-wing rather than center seems to matter
Being from an ethnic minority seems to matter
Having suffered discrimination seems to matter
Supporting a third-gender option on civil documents seems to matter
Refusing to answer the question on whether you know a transgender person seems to matter
Final variable selection
From these hints above and general intuition, we will select the following variables as our individual level fixed effects in the mixed level model:
agegenderreligion: given that the majority of people in our sample are catholic while the most important levels seem to be non_believers, muslims and Orthodox, instead of using all the levels we are going to use these 3 levelsoccupation: it is not clear which levels might be relevant so we will not include any of themPlace you live does not seem to matter (hence no
communityvariables in our final model)marital_statusit is not clear which levels are important so we drop this variablesocial_classas our descriptive analysis was suggesting that this might be relevant and given that we are not using occupation, we will include all levels ofsocial_classphone_accessseems like it might be important, but its interpretation is difficult and I am not sure what this variable can be a proxy of, therefore we exclude itbill_issueswill be recoded as a dummy using the level almost never. This means that those with 1s in our dummy experience difficult paying bills on a sporadic or constant basis. This can easily be interpreted as a proxy of whether a person is poor or not.internet_useon a similar concept as the previous point, this variable will be recoded as a dummy storing whether one uses internet every day or not.life_sat: people who are not satisfied in life (Not very satisfied & Not at all satisfied) seem to be really different from the rest. So a dummy will be created using those 2 levelsleft_right: right wing people seem significantly different than center or left wing so a dummy will be created for right wing peopleethnic_minorityandroma_minorityseem quite important. Given that Roma is an ethnic minority we will keep only the former and discard the latter, which would be redundant informationWhether you have suffered discrimination (
suffered_discri) mattersn_friends_minoritesandfriends_transare very similar concepts.n_friends_minoritesis derived summing the “yes” answers to “do you know a person that belong to this minority x”, but excluding the transgender minority whose answers are left as a standalone variable (friends_trans). We are going to use directlyfriends_transas it is more directly related with our target, discardingn_friends_minoritiesthat likely captures a similar effectantilgbtq_rightsis clearly a good predictor. Asgender_docsagain captures a similar concept we are going to use onlyantilgbtq_rightsminority_disciit is also closely related to the last point above, but we are going to keep it as a more general score of how discriminatory you are
And these will be our pre-defined fixed effects.
# Preparing the dataframe
complete_df <- complete_df |>
# Irrelevant variables
select(-c(years_edu, community, marital_status, occupation,
nonEU_national, phone_access, polintr, social_alienation,
gender_docs, n_friends_minorities,
n_actions_against_discri, skincolor_minority,
religious_minority, roma_minority, sexual_minority,
disability_minority)) |>
# Religion
mutate(non_believer = ifelse(religion == "Non-believers", 1, 0),
muslim = ifelse(religion == "Muslim", 1, 0),
orthodox = ifelse(religion == "Orthodox Christian", 1, 0)) |>
select(-religion) |>
# Bill issues
mutate(poor = ifelse(bill_issues != "Almost never/never", 1, 0)) |>
select(-bill_issues) |>
# Internet use
mutate(everyday_internet = ifelse(internet_use == "Everyday/Almost everyday", 1, 0)) |>
select(-internet_use) |>
# Life satisfaction
mutate(unhappy = ifelse(life_sat %in% c("Not very satisfied", "Not at all satisfied"), 1, 0)) |>
select(-life_sat) |>
# Ideology
mutate(right_wing = ifelse(left_right == "(7 -10) Right", 1, 0)) |>
select(-left_right)## 'data.frame': 27438 obs. of 21 variables:
## $ country : chr "BE" "BE" "BE" "BE" ...
## $ age : num 51 62 38 29 63 41 48 88 44 45 ...
## $ gender : Factor w/ 2 levels "Man","Woman": 1 2 2 1 2 2 1 1 2 2 ...
## $ social_class : Factor w/ 5 levels "The working class of society",..: 2 3 2 4 4 3 3 3 3 4 ...
## $ ethnic_minority : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "ARE YOU PART OF X MINORITY"
## $ suffered_discr : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "HAVE YOU BEEN SUBJECT TO DISCRIMINATION"
## $ friends_trans : Factor w/ 3 levels "Yes","No","Refusal (SPONTANEOUS)": 2 2 2 2 2 2 2 2 2 2 ...
## ..- attr(*, "label")= chr "CONTACT: TRANSGENDER/TRANSSEXUAL"
## $ antilgbtq_rights : num 2 2.33 1.33 1.33 2.67 1.33 1.33 2.33 1.67 1 ...
## ..- attr(*, "label")= chr "HIGHER -> THEY OPPOSE MORE RIGHTS TO LGBTQ"
## $ minority_discri : num 3.38 3.75 3.38 1 3.12 ...
## $ trans_docs : Factor w/ 2 levels "Yes","No": 1 1 1 1 1 1 1 2 1 1 ...
## ..- attr(*, "label")= chr "TRANSGENDER - CHANGE CIVIL DOCUMENTS"
## $ gdp_pc_ppp : num 60452 60452 60452 60452 60452 ...
## $ gender_inequality_index: num 0.048 0.048 0.048 0.048 0.048 0.048 0.048 0.048 0.048 0.048 ...
## $ lgbt_policy_index : num 10 10 10 10 10 10 10 10 10 10 ...
## $ democracy_index : num 76.4 76.4 76.4 76.4 76.4 76.4 76.4 76.4 76.4 76.4 ...
## $ non_believer : num 0 0 1 1 0 1 1 0 0 0 ...
## $ muslim : num 0 0 0 0 0 0 0 0 0 0 ...
## $ orthodox : num 0 0 0 0 0 0 0 0 0 0 ...
## $ poor : num 1 0 1 0 0 0 1 1 0 0 ...
## $ everyday_internet : num 1 1 1 1 1 1 1 0 1 1 ...
## $ unhappy : num 0 0 0 0 0 0 0 0 0 0 ...
## $ right_wing : num 1 1 0 1 1 1 0 1 0 1 ...
Mixed model
Random Intercept Model:
Allows the baseline level (intercept) to vary across groups. Assumes the relationship between predictors and outcome (slope) is the same for all groups
In R syntax: y ~ x + (1|group)
Example: Different schools might have different average test scores (random intercepts), but the effect of study hours on test scores is the same across all schools
Random Slope Model:
Allows the effect of a predictor (slope) to vary across groups. Can be used with or without random intercepts.
In R syntax: y ~ x + (0+x|group) for random slope only, or y ~ x + (1+x|group) for both random intercept and slope
Example: The effect of study hours on test scores might be stronger in some schools than others (random slopes)
Random intercept models account for different baselines between groups, while random slope models account for different relationships between predictors and outcomes across groups. When both random intercepts and slopes are included, you’re allowing both the baseline and the effect of predictors to vary by group.
Interactions between group-level (higher-level) variables and individual-level (lower-level) variables in mixed models are called cross-level interactions.
Cross-level interactions are particularly useful in multilevel research because they help you understand how the relationship between an individual-level predictor and the outcome varies as a function of a group-level characteristic.
For example, in educational research:
Individual level: student characteristics (study time, prior knowledge).
Group level: school or classroom characteristics (class size, teaching method).
Cross-level interaction: Does the effect of study time on performance depend on class size?
Running the glmer models
To model, we will follow: Approach to multilevel model building based on Hox (2010)
1/ Null Model (Random Intercept only)
2/ Add independent Level 1 variables
3/ Add independent Level 2 variables
4/ Add random slopes
5/ (Cross-level) interactions
Each step, check whether your model is significantly improved compared to the previous one.
Adjusted ICC: The proportion of total variance explained by the grouping structure (random effects), adjusted to include the variance explained by the fixed effects
Unadjusted ICC: The proportion of total variance explained by the grouping structure (random effects), excluding any variance explained by the fixed effects in the model
complete_df <- complete_df |>
drop_na() |> # we have NAs only in our target
mutate(trans_docs = as.numeric(trans_docs == "Yes"),
ethnic_minority = as.numeric(ethnic_minority) - 1,
suffered_discr = as.numeric(suffered_discr) - 1)
# Scaling the variables is necessary otherwise the model cannot run
# However we loose interpretability
scaled_df <- complete_df |>
mutate(across(c(age, antilgbtq_rights, minority_discri, gdp_pc_ppp,
gender_inequality_index, lgbt_policy_index,
democracy_index), ~scale(.x, center = TRUE, scale = TRUE)))# Null model random intercept only
glmer_step1 <- glmer(trans_docs ~ (1|country),
data = scaled_df,
family=binomial(link="logit"))
# Printing coefficients
nice_table(broom.mixed::tidy(glmer_step1))effect | group | Term | estimate | std.error | statistic | p |
|---|---|---|---|---|---|---|
fixed | (Intercept) | 0.45 | 0.19 | 2.41 | .016* | |
ran_pars | country | sd__(Intercept) | 0.98 |
## $country
## (Intercept)
## AT 0.020782239
## BE 0.531337356
## BG -2.038432593
## CY -0.290724299
## CZ -0.658351325
## DE 0.875561536
## DK 0.910532765
## EE 0.009170019
## ES 1.763206747
## FI 0.443774385
## FR 0.870511333
## GB 0.402502267
## GR -0.051877595
## HR -0.631306154
## HU -2.093833738
## IE 0.554181537
## IT -0.539249992
## LT -0.741686699
## LU 0.854896971
## LV -0.510739697
## MT 1.282868521
## NL 1.406610450
## PL -0.447199502
## PT 0.782044710
## RO -1.599375280
## SE 0.500727881
## SI -0.334788326
## SK -1.293857998
##
## with conditional variances for "country"
## [1] 28212.65
## Area under the curve: 0.7405
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.227
## Unadjusted ICC: 0.227
…
# Random intercept + individual fixed effects
glmer_step2 <- glmer(trans_docs ~ age + gender + social_class +
ethnic_minority + suffered_discr +
friends_trans + antilgbtq_rights +
minority_discri + non_believer + muslim +
orthodox + poor + everyday_internet +
unhappy + right_wing + (1|country),
data = scaled_df,
family = binomial(link="logit"))
# Printing coefficients
nice_table(broom.mixed::tidy(glmer_step2))effect | group | Term | estimate | std.error | statistic | p |
|---|---|---|---|---|---|---|
fixed | (Intercept) | 0.57 | 0.15 | 3.83 | < .001*** | |
fixed | age | 0.09 | 0.02 | 4.50 | < .001*** | |
fixed | genderWoman | 0.30 | 0.03 | 9.09 | < .001*** | |
fixed | social_classThe lower middle class of society | 0.02 | 0.05 | 0.47 | .638 | |
fixed | social_classThe middle class of society | 0.05 | 0.04 | 1.21 | .228 | |
fixed | social_classThe upper middle class of society | -0.10 | 0.07 | -1.35 | .177 | |
fixed | social_classThe higher class of society | 0.40 | 0.23 | 1.76 | .078 | |
fixed | ethnic_minority | -0.26 | 0.09 | -2.76 | .006** | |
fixed | suffered_discr | -0.06 | 0.05 | -1.41 | .159 | |
fixed | friends_transNo | -0.48 | 0.06 | -7.70 | < .001*** | |
fixed | friends_transRefusal (SPONTANEOUS) | -0.81 | 0.16 | -5.05 | < .001*** | |
fixed | antilgbtq_rights | -0.93 | 0.02 | -41.41 | < .001*** | |
fixed | minority_discri | -0.33 | 0.02 | -16.02 | < .001*** | |
fixed | non_believer | 0.16 | 0.05 | 3.43 | .001*** | |
fixed | muslim | -0.11 | 0.14 | -0.77 | .438 | |
fixed | orthodox | 0.04 | 0.08 | 0.55 | .585 | |
fixed | poor | -0.04 | 0.04 | -0.99 | .321 | |
fixed | everyday_internet | 0.42 | 0.04 | 9.30 | < .001*** | |
fixed | unhappy | -0.18 | 0.05 | -3.80 | < .001*** | |
fixed | right_wing | -0.25 | 0.04 | -6.81 | < .001*** | |
ran_pars | country | sd__(Intercept) | 0.66 |
## $country
## (Intercept)
## AT -0.02120803
## BE 0.22957727
## BG -1.11582649
## CY 0.19510952
## CZ -0.48793018
## DE 0.37873026
## DK 0.05356004
## EE 0.58147710
## ES 1.10242641
## FI 0.13357975
## FR 0.29652252
## GB -0.50033006
## GR 0.63616091
## HR -0.23104448
## HU -1.65716699
## IE 0.02869078
## IT -0.46469161
## LT 0.07106727
## LU 0.16907201
## LV 0.33434733
## MT 1.46642536
## NL 0.23828124
## PL -0.04229810
## PT 0.99027118
## RO -0.84670653
## SE -0.75220030
## SI -0.31388527
## SK -0.49255097
##
## with conditional variances for "country"
## [1] 23612.7
## Area under the curve: 0.8389
# Turning them to classes using custom threshold
predicted_classes <- ifelse(predicted_probs > 0.5, 1, 0)
# ICC
performance::icc(glmer_step2)## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.118
## Unadjusted ICC: 0.080
## Data: scaled_df
## Models:
## glmer_step1: trans_docs ~ (1 | country)
## glmer_step2: trans_docs ~ age + gender + social_class + ethnic_minority + suffered_discr + friends_trans + antilgbtq_rights + minority_discri + non_believer + muslim + orthodox + poor + everyday_internet + unhappy + right_wing + (1 | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## glmer_step1 2 28213 28229 -14104 28209
## glmer_step2 21 23613 23783 -11785 23571 4637.9 19 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
…
# Random intercept + individual and country fixed effects
glmer_step3 <- glmer(trans_docs ~ age + gender + social_class +
ethnic_minority + suffered_discr +
friends_trans + antilgbtq_rights +
minority_discri + non_believer + muslim +
orthodox + poor + everyday_internet +
unhappy + right_wing + gdp_pc_ppp +
lgbt_policy_index + gender_inequality_index +
democracy_index + (1|country),
data = scaled_df,
family = binomial(link="logit"))
# Printing coefficients
nice_table(broom.mixed::tidy(glmer_step3))effect | group | Term | estimate | std.error | statistic | p |
|---|---|---|---|---|---|---|
fixed | (Intercept) | 0.58 | 0.14 | 4.21 | < .001*** | |
fixed | age | 0.09 | 0.02 | 4.41 | < .001*** | |
fixed | genderWoman | 0.30 | 0.03 | 9.12 | < .001*** | |
fixed | social_classThe lower middle class of society | 0.03 | 0.05 | 0.48 | .629 | |
fixed | social_classThe middle class of society | 0.05 | 0.04 | 1.22 | .221 | |
fixed | social_classThe upper middle class of society | -0.10 | 0.07 | -1.36 | .173 | |
fixed | social_classThe higher class of society | 0.40 | 0.23 | 1.77 | .077 | |
fixed | ethnic_minority | -0.26 | 0.09 | -2.77 | .006** | |
fixed | suffered_discr | -0.07 | 0.05 | -1.44 | .151 | |
fixed | friends_transNo | -0.47 | 0.06 | -7.68 | < .001*** | |
fixed | friends_transRefusal (SPONTANEOUS) | -0.81 | 0.16 | -5.04 | < .001*** | |
fixed | antilgbtq_rights | -0.92 | 0.02 | -41.25 | < .001*** | |
fixed | minority_discri | -0.32 | 0.02 | -15.97 | < .001*** | |
fixed | non_believer | 0.15 | 0.05 | 3.41 | .001*** | |
fixed | muslim | -0.11 | 0.14 | -0.79 | .432 | |
fixed | orthodox | 0.06 | 0.08 | 0.76 | .449 | |
fixed | poor | -0.04 | 0.04 | -0.95 | .342 | |
fixed | everyday_internet | 0.42 | 0.04 | 9.27 | < .001*** | |
fixed | unhappy | -0.18 | 0.05 | -3.76 | < .001*** | |
fixed | right_wing | -0.25 | 0.04 | -6.81 | < .001*** | |
fixed | gdp_pc_ppp | -0.04 | 0.14 | -0.28 | .779 | |
fixed | lgbt_policy_index | 0.22 | 0.16 | 1.35 | .177 | |
fixed | gender_inequality_index | -0.11 | 0.16 | -0.65 | .514 | |
fixed | democracy_index | 0.01 | 0.22 | 0.06 | .948 | |
ran_pars | country | sd__(Intercept) | 0.59 |
## $country
## (Intercept)
## AT -0.18772528
## BE -0.02995288
## BG -0.77925182
## CY 0.67565403
## CZ -0.36218234
## DE 0.18443394
## DK -0.26115206
## EE 0.60416818
## ES 0.78914762
## FI -0.25378064
## FR 0.09224124
## GB -0.76869694
## GR 0.68311680
## HR -0.16485018
## HU -1.29103673
## IE 0.03261993
## IT -0.36310134
## LT 0.36892669
## LU 0.21176279
## LV 0.78402479
## MT 1.13090540
## NL -0.13536533
## PL 0.25822329
## PT 0.72875910
## RO -0.44018566
## SE -1.15290900
## SI -0.31157003
## SK -0.06296424
##
## with conditional variances for "country"
## [1] 23614.63
## Area under the curve: 0.8389
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.097
## Unadjusted ICC: 0.060
## Data: scaled_df
## Models:
## glmer_step1: trans_docs ~ (1 | country)
## glmer_step2: trans_docs ~ age + gender + social_class + ethnic_minority + suffered_discr + friends_trans + antilgbtq_rights + minority_discri + non_believer + muslim + orthodox + poor + everyday_internet + unhappy + right_wing + (1 | country)
## glmer_step3: trans_docs ~ age + gender + social_class + ethnic_minority + suffered_discr + friends_trans + antilgbtq_rights + minority_discri + non_believer + muslim + orthodox + poor + everyday_internet + unhappy + right_wing + gdp_pc_ppp + lgbt_policy_index + gender_inequality_index + democracy_index + (1 | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## glmer_step1 2 28213 28229 -14104 28209
## glmer_step2 21 23613 23783 -11785 23571 4637.948 19 <2e-16 ***
## glmer_step3 25 23615 23817 -11782 23565 6.069 4 0.1941
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
…
# Adding random slopes and cross level interaction
glmer_step4 <- glmer(trans_docs ~
# Individual level variables
age + gender + social_class +
ethnic_minority + suffered_discr +
friends_trans + antilgbtq_rights +
minority_discri + non_believer + muslim +
orthodox + poor + everyday_internet +
unhappy + right_wing +
# Country level variables
gdp_pc_ppp + gender_inequality_index +
lgbt_policy_index + democracy_index +
# Cross-level interactions
right_wing:lgbt_policy_index +
social_class:gdp_pc_ppp +
gender:gender_inequality_index +
gender:lgbt_policy_index +
age:gdp_pc_ppp +
non_believer:democracy_index +
non_believer:gender_inequality_index +
unhappy:lgbt_policy_index +
poor:democracy_index +
(1 + gdp_pc_ppp +
lgbt_policy_index + gender_inequality_index +
democracy_index |country),
data = scaled_df,
family = binomial(link="logit"))
# Printing coefficients
nice_table(broom.mixed::tidy(glmer_step4))effect | group | Term | estimate | std.error | statistic | p |
|---|---|---|---|---|---|---|
fixed | (Intercept) | 0.54 | 0.12 | 4.39 | < .001*** | |
fixed | age | 0.09 | 0.02 | 4.56 | < .001*** | |
fixed | genderWoman | 0.31 | 0.03 | 9.33 | < .001*** | |
fixed | social_classThe lower middle class of society | 0.03 | 0.05 | 0.64 | .524 | |
fixed | social_classThe middle class of society | 0.04 | 0.04 | 1.00 | .316 | |
fixed | social_classThe upper middle class of society | -0.06 | 0.08 | -0.78 | .436 | |
fixed | social_classThe higher class of society | 0.40 | 0.23 | 1.74 | .082 | |
fixed | ethnic_minority | -0.26 | 0.09 | -2.73 | .006** | |
fixed | suffered_discr | -0.07 | 0.05 | -1.58 | .115 | |
fixed | friends_transNo | -0.46 | 0.06 | -7.51 | < .001*** | |
fixed | friends_transRefusal (SPONTANEOUS) | -0.79 | 0.16 | -4.94 | < .001*** | |
fixed | antilgbtq_rights | -0.92 | 0.02 | -41.09 | < .001*** | |
fixed | minority_discri | -0.32 | 0.02 | -15.89 | < .001*** | |
fixed | non_believer | 0.14 | 0.05 | 2.91 | .004** | |
fixed | muslim | -0.10 | 0.14 | -0.74 | .458 | |
fixed | orthodox | 0.06 | 0.08 | 0.71 | .478 | |
fixed | poor | -0.04 | 0.04 | -1.03 | .303 | |
fixed | everyday_internet | 0.42 | 0.04 | 9.30 | < .001*** | |
fixed | unhappy | -0.18 | 0.05 | -3.58 | < .001*** | |
fixed | right_wing | -0.26 | 0.04 | -7.02 | < .001*** | |
fixed | gdp_pc_ppp | -0.07 | 0.12 | -0.59 | .553 | |
fixed | gender_inequality_index | 0.09 | 0.19 | 0.44 | .657 | |
fixed | lgbt_policy_index | 0.10 | 0.15 | 0.69 | .490 | |
fixed | democracy_index | 0.19 | 0.14 | 1.33 | .185 | |
fixed | right_wing × lgbt_policy_index | -0.08 | 0.04 | -2.14 | .033* | |
fixed | social_classThe lower middle class of society × gdp_pc_ppp | 0.05 | 0.06 | 0.81 | .419 | |
fixed | social_classThe middle class of society × gdp_pc_ppp | -0.03 | 0.04 | -0.79 | .428 | |
fixed | social_classThe upper middle class of society × gdp_pc_ppp | -0.12 | 0.07 | -1.87 | .061 | |
fixed | social_classThe higher class of society × gdp_pc_ppp | -0.10 | 0.23 | -0.43 | .668 | |
fixed | genderWoman × gender_inequality_index | -0.03 | 0.05 | -0.66 | .506 | |
fixed | genderWoman × lgbt_policy_index | 0.12 | 0.04 | 2.70 | .007** | |
fixed | age × gdp_pc_ppp | -0.01 | 0.02 | -0.68 | .494 | |
fixed | non_believer × democracy_index | -0.19 | 0.07 | -2.82 | .005** | |
fixed | non_believer × gender_inequality_index | -0.25 | 0.08 | -3.27 | .001** | |
fixed | unhappy × lgbt_policy_index | -0.02 | 0.05 | -0.34 | .737 | |
fixed | poor × democracy_index | -0.02 | 0.04 | -0.49 | .623 | |
ran_pars | country | sd__(Intercept) | 0.37 | |||
ran_pars | country | cor__(Intercept).gdp_pc_ppp | -1.00 | |||
ran_pars | country | cor__(Intercept).lgbt_policy_index | 0.65 | |||
ran_pars | country | cor__(Intercept).gender_inequality_index | 0.96 | |||
ran_pars | country | cor__(Intercept).democracy_index | 1.00 | |||
ran_pars | country | sd__gdp_pc_ppp | 0.40 | |||
ran_pars | country | cor__gdp_pc_ppp.lgbt_policy_index | -0.58 | |||
ran_pars | country | cor__gdp_pc_ppp.gender_inequality_index | -0.93 | |||
ran_pars | country | cor__gdp_pc_ppp.democracy_index | -1.00 | |||
ran_pars | country | sd__lgbt_policy_index | 0.26 | |||
ran_pars | country | cor__lgbt_policy_index.gender_inequality_index | 0.84 | |||
ran_pars | country | cor__lgbt_policy_index.democracy_index | 0.59 | |||
ran_pars | country | sd__gender_inequality_index | 0.49 | |||
ran_pars | country | cor__gender_inequality_index.democracy_index | 0.93 | |||
ran_pars | country | sd__democracy_index | 0.50 |
## $country
## (Intercept) gdp_pc_ppp lgbt_policy_index gender_inequality_index
## AT -0.016860767 0.021024603 0.007942770 -0.010202758
## BE -0.318645933 0.350373654 -0.093472325 -0.368521299
## BG -0.381823921 0.407843015 -0.174238995 -0.486478311
## CY 0.113326990 -0.120968070 0.052137516 0.144693727
## CZ -0.322006634 0.348031690 -0.125769878 -0.394993579
## DE 0.176687421 -0.186025620 0.094642869 0.235224610
## DK -0.145906122 0.158827470 -0.051133493 -0.174754330
## EE 0.501648256 -0.550035761 0.155254659 0.586010045
## ES 0.660871882 -0.698200442 0.341547730 0.870841169
## FI -0.096044784 0.102328703 -0.045182372 -0.123346315
## FR 0.145401701 -0.151796164 0.084576489 0.198400762
## GB -0.271677358 0.283788550 -0.157180475 -0.370092823
## GR 0.449870700 -0.484483366 0.184770461 0.558374173
## HR -0.004747601 0.005069061 -0.002176907 -0.006056406
## HU -0.604316013 0.632184648 -0.344812021 -0.819754973
## IE -0.007129522 0.007842187 -0.002077747 -0.008235570
## IT 0.153022794 -0.150853576 0.135163905 0.242091726
## LT 0.260392738 -0.304919619 -0.020084154 0.231565982
## LU -0.168717723 0.181133761 -0.072226276 -0.211524360
## LV 0.304704559 -0.336950325 0.079495279 0.345265756
## MT 0.321186150 -0.328214709 0.223632170 0.464807871
## NL 0.040427632 -0.042379489 0.022613723 0.054512886
## PL -0.285214087 0.279589087 -0.260133026 -0.457145309
## PT 0.486407858 -0.513634172 0.252667506 0.641874319
## RO -0.333930830 0.351233884 -0.180662106 -0.445855890
## SE -0.601208592 0.640623913 -0.282411572 -0.771807665
## SI 0.058289519 -0.024920872 0.220269586 0.213963660
## SK -0.123766686 0.133718694 -0.048605441 -0.152010773
## democracy_index
## AT -0.025690625
## BE -0.439981884
## BG -0.515582161
## CY 0.152947989
## CZ -0.438768264
## DE 0.235963490
## DK -0.199907310
## EE 0.691154467
## ES 0.884912402
## FI -0.129437552
## FR 0.192930798
## GB -0.360642040
## GR 0.611303116
## HR -0.006408749
## HU -0.803108495
## IE -0.009847038
## IT 0.194414435
## LT 0.377581407
## LU -0.228712894
## LV 0.422580120
## MT 0.419294827
## NL 0.053811320
## PL -0.360828742
## PT 0.651063444
## RO -0.445624594
## SE -0.810313912
## SI 0.042502161
## SK -0.168595816
##
## with conditional variances for "country"
## [1] 23606.46
## Area under the curve: 0.8394
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.094
## Unadjusted ICC: 0.060
## Data: scaled_df
## Models:
## glmer_step1: trans_docs ~ (1 | country)
## glmer_step2: trans_docs ~ age + gender + social_class + ethnic_minority + suffered_discr + friends_trans + antilgbtq_rights + minority_discri + non_believer + muslim + orthodox + poor + everyday_internet + unhappy + right_wing + (1 | country)
## glmer_step3: trans_docs ~ age + gender + social_class + ethnic_minority + suffered_discr + friends_trans + antilgbtq_rights + minority_discri + non_believer + muslim + orthodox + poor + everyday_internet + unhappy + right_wing + gdp_pc_ppp + lgbt_policy_index + gender_inequality_index + democracy_index + (1 | country)
## glmer_step4: trans_docs ~ age + gender + social_class + ethnic_minority + suffered_discr + friends_trans + antilgbtq_rights + minority_discri + non_believer + muslim + orthodox + poor + everyday_internet + unhappy + right_wing + gdp_pc_ppp + gender_inequality_index + lgbt_policy_index + democracy_index + right_wing:lgbt_policy_index + social_class:gdp_pc_ppp + gender:gender_inequality_index + gender:lgbt_policy_index + age:gdp_pc_ppp + non_believer:democracy_index + non_believer:gender_inequality_index + unhappy:lgbt_policy_index + poor:democracy_index + (1 + gdp_pc_ppp + lgbt_policy_index + gender_inequality_index + democracy_index | country)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## glmer_step1 2 28213 28229 -14104 28209
## glmer_step2 21 23613 23783 -11785 23571 4637.948 19 < 2.2e-16 ***
## glmer_step3 25 23615 23817 -11782 23565 6.069 4 0.1940554
## glmer_step4 51 23607 24019 -11752 23505 60.170 26 0.0001591 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Class predictions
predicted_classes <- ifelse(predicted_probs > 0.5, 1, 0)
# Confusion matrix
confusionMatrix(as.factor(scaled_df$trans_docs), as.factor(predicted_classes))## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 7468 2227
## 1 3433 11030
##
## Accuracy : 0.7657
## 95% CI : (0.7603, 0.771)
## No Information Rate : 0.5488
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.5222
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.6851
## Specificity : 0.8320
## Pos Pred Value : 0.7703
## Neg Pred Value : 0.7626
## Prevalence : 0.4512
## Detection Rate : 0.3091
## Detection Prevalence : 0.4013
## Balanced Accuracy : 0.7585
##
## 'Positive' Class : 0
##
plot.roc(scaled_df$trans_docs, predicted_probs,
col="darkblue",
print.auc = TRUE,
auc.polygon=TRUE,
max.auc.polygon=TRUE,
auc.polygon.col="lightblue",
print.thres="best")right_wing:lgbt_policy_index: Tests whether the effect
of individual political ideology varies based on a country’s LGBT policy
environment. The relationship between conservative views and trans
attitudes might be weaker in countries with strong LGBT protections.
social_class:gdp_pc_ppp: Tests whether social class
effects differ across countries with varying economic development.
gender:gender_inequality_index: Tests how the
relationship between being female and attitudes toward transgender
people varies depending on the level of gender inequality in their
country.
gender:lgbt_policy_index: Tests how the relationship
between an individual’s gender and their attitudes toward transgender
people varies depending on the LGBTQ policy environment of their
country.
age:gdp_pc_ppp: Tests how the relationship between an
individual’s age and their attitudes toward transgender people varies
depending on the economic development level of their country.
non_believer:democracy_index: Tests how the relationship
between being religious and attitudes toward transgender people vary
depending on the democratic context of their country.
non_believer:gender_inequality_index: Tests how the
relationship between being religious and attitudes toward transgender
people vary depending on the level of gender inequality in their
country
unhappy:lgbt_policy_index: Tests how the relationship
between an individual’s unhappiness (or life dissatisfaction) and their
attitudes toward transgender people varies depending on the LGBTQ policy
environment of their country.
poor:democracy_index: Tests how the relationship between
an individual’s economic status (being poor) and their attitudes toward
transgender people varies depending on the democratic context of their
country.
Interpreting results
…
Predictive model
Developing a Predictive Model for Other Countries
Summarize individual data to a country level (ex. percentage of population living in rural areas by country, percentage of people from minority x by country). Target variable will become % of people in country x that support the right for transgender people to change their civil documents.
Then run random forest/gradient boosting/elastic-net linear regression with leave-one-out cross validation.
Leave-One-Out Cross-Validation (LOOCV): LOOCV is a special case of k-fold cross-validation, where k = the number of observations (n). That means: For each observation (1 out of 28), the model is trained on the remaining 27 rows and tested on the left-out row. This process repeats 28 times (once per row). The final performance metric (e.g., accuracy, RMSE) is the average of all 28 test results.
We switched to country level predictions because this is as we understood the task.
Given that by switching to country level data we only have 28 observations it does not make sense to split our dataframe into test and training for model comparison. Instead we will exploit the fact of having so few rows to run leave-one-out cross validation for hyperparameter tuning and to compare our models.
It is also worth noting that the generalizability of our model is
very low. Partially because of the few observations, partially because
it is trained only on data from European Union countries. These
countries share many similarities. The models we derive could be used to
predict support for trans_docs in other European countries,
but these models’ findings are likely to be extremely wrong for
countries outside Europe, or with a socio-economic structure completely
different than that of EU countries.
Aggregating individual data to country level
We should be building a dataframe with 28 rows (1 per country), summarizing all our variables at the country level.
In the next steps we will synthesize the information from the individual respondents into country-level variables.
Using means for numeric variables:
# using final_data which contains individual level data only
names(select(final_data, where(is.numeric))) ## [1] "age" "years_edu"
## [3] "social_alienation" "n_friends_minorities"
## [5] "n_actions_against_discri" "antilgbtq_rights"
## [7] "minority_discri"
data_num <- final_data |>
group_by(country) |>
summarise(across(where(is.numeric), ~mean(.), .names = "mean_{.col}"))As well as for binary dummy variables:
# Searching for factor variables with two levels
dummy_names <- names(final_data)[sapply(final_data,
is.factor) &
sapply(final_data,
function(x) length(levels(x)) == 2)]
dummy_names## [1] "gender" "nonEU_national" "ethnic_minority"
## [4] "skincolor_minority" "religious_minority" "roma_minority"
## [7] "sexual_minority" "disability_minority" "suffered_discr"
## [10] "gender_docs" "trans_docs"
## $gender
## [1] "Man" "Woman"
##
## $nonEU_national
## [1] "0" "1"
##
## $ethnic_minority
## [1] "0" "1"
##
## $skincolor_minority
## [1] "0" "1"
##
## $religious_minority
## [1] "0" "1"
##
## $roma_minority
## [1] "0" "1"
##
## $sexual_minority
## [1] "0" "1"
##
## $disability_minority
## [1] "0" "1"
##
## $suffered_discr
## [1] "0" "1"
##
## $gender_docs
## [1] "Yes" "No"
##
## $trans_docs
## [1] "Yes" "No"
data_dummy <- final_data %>%
select(country, all_of(dummy_names)) %>%
mutate(across(all_of(dummy_names),
~ case_when(
# convert "0"/"1" to 0/1
. %in% c("0", "1") ~ as.numeric(as.character(.)),
# convert "Yes"/"No" to 1/0
. %in% c("Yes", "No") ~ recode(., "Yes" = 1, "No" = 0),
# convert "Man"/"Woman" to 1/0
. %in% c("Man", "Woman") ~ recode(., "Man" = 1, "Woman" = 0),
TRUE ~ NA
),
.names = "{.col}_num"))I converted factors with 2 levels to numerical 0-1 so we can compute their means.
# Country means
data_dummy <- data_dummy %>%
group_by(country) %>%
summarise(
across(ends_with("_num"), ~ mean(.x, na.rm = TRUE)),
.groups = "drop"
)
# substitute the "_num" suffix with a "mean_" prefix so it has the same format as the variables in data_num
data_dummy <- data_dummy %>%
rename_with(~ paste0("mean_", str_remove(., "_num$")), ends_with("_num"))Now we can join the two datasets.
The next variables are factor with >2 levels:
factor_names <- names(final_data)[!sapply(final_data, is.numeric) &
!(sapply(final_data, is.factor) &
sapply(final_data,
function(x) length(levels(x)) == 2))]
factor_names## [1] "country" "community" "marital_status" "occupation"
## [5] "social_class" "religion" "phone_access" "bill_issues"
## [9] "internet_use" "life_sat" "polintr" "left_right"
## [13] "friends_trans"
## $country
## NULL
##
## $community
## [1] "Rural area or village" "Small or middle sized town"
## [3] "Large town"
##
## $marital_status
## [1] "(Re-)Married (1-4 in d7)"
## [2] "Single living with partner (5-8 in d7)"
## [3] "Single (9-10 in d7)"
## [4] "Divorced or separated (11-12 in d7)"
## [5] "Widow (13-14 in d7)"
##
## $occupation
## [1] "Self-employed (5 to 9 in d15a)"
## [2] "Managers (10 to 12 in d15a)"
## [3] "Other white collars (13 or 14 in d15a)"
## [4] "Manual workers (15 to 18 in d15a)"
## [5] "House persons (1 in d15a)"
## [6] "Unemployed (3 in d15a)"
## [7] "Retired (4 in d15a)"
## [8] "Students (2 in d15a)"
##
## $social_class
## [1] "The working class of society" "The lower middle class of society"
## [3] "The middle class of society" "The upper middle class of society"
## [5] "The higher class of society"
##
## $religion
## [1] "Catholic" "Orthodox Christian" "Protestant"
## [4] "Other Christian" "Other" "Muslim"
## [7] "Non-believers"
##
## $phone_access
## [1] "Mobile only" "Landline only" "Landline & mobile"
## [4] "No telephone"
##
## $bill_issues
## [1] "Most of the time" "From time to time" "Almost never/never"
##
## $internet_use
## [1] "Everyday/Almost everyday" "Two or three times a week"
## [3] "About once a week" "Two or three times a month"
## [5] "Less often" "Never/No access"
## [7] "No Internet access at all"
##
## $life_sat
## [1] "Very satisfied" "Fairly satisfied" "Not very satisfied"
## [4] "Not at all satisfied"
##
## $polintr
## [1] "Strong" "Medium" "Low" "Not at all"
##
## $left_right
## [1] "(1 - 4) Left" "(5 - 6) Centre" "(7 -10) Right"
##
## $friends_trans
## [1] "Yes" "No" "Refusal (SPONTANEOUS)"
# removing country from the list
factor_names <- setdiff(factor_names, "country")
# creating the aggregated dataset for factor variables
data_factors <- final_data %>%
select(country, all_of(factor_names)) %>%
pivot_longer(cols = -country, names_to = "variable", values_to = "level") %>%
group_by(country, variable, level) %>%
summarise(count = n(), .groups = "drop") %>%
group_by(country, variable) %>%
mutate(proportion = count / sum(count)) %>%
select(-count) %>%
pivot_wider(names_from = c(variable, level), values_from = proportion, names_glue = "{variable}_{level}")
colSums(is.na(data_factors))## country
## 0
## bill_issues_Most of the time
## 0
## bill_issues_From time to time
## 0
## bill_issues_Almost never/never
## 0
## community_Rural area or village
## 0
## community_Small or middle sized town
## 0
## community_Large town
## 0
## friends_trans_Yes
## 0
## friends_trans_No
## 0
## friends_trans_Refusal (SPONTANEOUS)
## 0
## internet_use_Everyday/Almost everyday
## 0
## internet_use_Two or three times a week
## 0
## internet_use_About once a week
## 1
## internet_use_Two or three times a month
## 0
## internet_use_Less often
## 0
## internet_use_Never/No access
## 0
## internet_use_No Internet access at all
## 0
## left_right_(1 - 4) Left
## 0
## left_right_(5 - 6) Centre
## 0
## left_right_(7 -10) Right
## 0
## life_sat_Very satisfied
## 0
## life_sat_Fairly satisfied
## 0
## life_sat_Not very satisfied
## 0
## life_sat_Not at all satisfied
## 0
## marital_status_(Re-)Married (1-4 in d7)
## 0
## marital_status_Single living with partner (5-8 in d7)
## 0
## marital_status_Single (9-10 in d7)
## 0
## marital_status_Divorced or separated (11-12 in d7)
## 0
## marital_status_Widow (13-14 in d7)
## 0
## occupation_Self-employed (5 to 9 in d15a)
## 0
## occupation_Managers (10 to 12 in d15a)
## 0
## occupation_Other white collars (13 or 14 in d15a)
## 0
## occupation_Manual workers (15 to 18 in d15a)
## 0
## occupation_House persons (1 in d15a)
## 0
## occupation_Unemployed (3 in d15a)
## 0
## occupation_Retired (4 in d15a)
## 0
## occupation_Students (2 in d15a)
## 0
## phone_access_Mobile only
## 0
## phone_access_Landline only
## 0
## phone_access_Landline & mobile
## 0
## phone_access_No telephone
## 1
## polintr_Strong
## 0
## polintr_Medium
## 0
## polintr_Low
## 0
## polintr_Not at all
## 0
## religion_Catholic
## 0
## religion_Orthodox Christian
## 0
## religion_Protestant
## 1
## religion_Other Christian
## 2
## religion_Other
## 0
## religion_Muslim
## 4
## religion_Non-believers
## 0
## social_class_The working class of society
## 0
## social_class_The lower middle class of society
## 0
## social_class_The middle class of society
## 0
## social_class_The upper middle class of society
## 0
## social_class_The higher class of society
## 3
We are seeing NAs because in not all countries are all the factors’
levels represented. For example we see that
social_class_The higher class of society has 3 missing
values. Let’s confirm that there are 3 countries in our dataframe with
no observation for that level
final_data |> filter(social_class=="The higher class of society") |> pull(country) |> unique() |> length()## [1] 25
There are indeed 3 countries with non observation of
social_class=="The higher class of society". Therefore
those percentages should be 0s and not NAs.
Joining all together
Lastly, joining the country level variables obtained from external sources
data_aggr <- data_aggr |>
left_join(country_level_data, by = c("country" = "iso2c")) |>
select(-c(iso3c, country.y))## [1] 28 79
The resulting data frame has 28 observations and 79 variables.
Linear regression with Elastic Net
data_models <- data_aggr |>
select(-country)
set.seed(123)
# Define the leave-one-out cross-validation control
ctrl <- trainControl(
method = "LOOCV",
)
# Defining Elastic net grid
elastic_grid <- expand.grid(
alpha = seq(0, 1, length = 10), # Search from Ridge (0) to Lasso (1)
lambda = 10^seq(-5, 1, length = 100) # Regularization strength
)
# Train the Elastic Net model
elastic_net_model <- train(
mean_trans_docs ~ .,
data = data_models,
method = "glmnet",
preProcess = c("center", "scale"),
trControl = ctrl,
tuneGrid = elastic_grid,
metric = "RMSE"
)
# Display the best alpha and lambda values
print(elastic_net_model)## glmnet
##
## 28 samples
## 77 predictors
##
## Pre-processing: centered (77), scaled (77)
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 27, 27, 27, 27, 27, 27, ...
## Resampling results across tuning parameters:
##
## alpha lambda RMSE Rsquared MAE
## 0.0000000 1.000000e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.149757e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.321941e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.519911e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.747528e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.009233e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.310130e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.656088e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.053856e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.511192e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.037017e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.641589e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 5.336699e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 6.135907e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 7.054802e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 8.111308e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 9.326033e-05 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.072267e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.232847e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.417474e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.629751e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.873817e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.154435e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.477076e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.848036e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.274549e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.764936e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.328761e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.977024e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 5.722368e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 6.579332e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 7.564633e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 8.697490e-04 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.000000e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.149757e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.321941e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.519911e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.747528e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.009233e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.310130e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.656088e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.053856e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.511192e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.037017e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.641589e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 5.336699e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 6.135907e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 7.054802e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 8.111308e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 9.326033e-03 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.072267e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.232847e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.417474e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.629751e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.873817e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.154435e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.477076e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.848036e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.274549e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.764936e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.328761e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.977024e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 5.722368e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 6.579332e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 7.564633e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 8.697490e-02 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.000000e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.149757e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.321941e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.519911e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.747528e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.009233e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.310130e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 2.656088e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.053856e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 3.511192e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.037017e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 4.641589e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 5.336699e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 6.135907e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 7.054802e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 8.111308e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 9.326033e-01 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.072267e+00 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.232847e+00 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.417474e+00 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.629751e+00 0.12394935 7.474396e-01 0.10254602
## 0.0000000 1.873817e+00 0.12435328 7.450423e-01 0.10273162
## 0.0000000 2.154435e+00 0.12787952 7.347501e-01 0.10518314
## 0.0000000 2.477076e+00 0.13186200 7.277453e-01 0.10797413
## 0.0000000 2.848036e+00 0.13601598 7.203588e-01 0.11080519
## 0.0000000 3.274549e+00 0.14030819 7.125815e-01 0.11365644
## 0.0000000 3.764936e+00 0.14470061 7.043931e-01 0.11650637
## 0.0000000 4.328761e+00 0.14915182 6.957580e-01 0.11933262
## 0.0000000 4.977024e+00 0.15361924 6.866129e-01 0.12265211
## 0.0000000 5.722368e+00 0.15806022 6.769056e-01 0.12674365
## 0.0000000 6.579332e+00 0.16242643 6.665007e-01 0.13114768
## 0.0000000 7.564633e+00 0.16668288 6.552456e-01 0.13539074
## 0.0000000 8.697490e+00 0.17079398 6.429552e-01 0.13944515
## 0.0000000 1.000000e+01 0.17472930 6.293765e-01 0.14328832
## 0.1111111 1.000000e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.149757e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.321941e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.519911e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.747528e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.009233e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.310130e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.656088e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 3.053856e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 3.511192e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 4.037017e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 4.641589e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 5.336699e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 6.135907e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 7.054802e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 8.111308e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 9.326033e-05 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.072267e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.232847e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.417474e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.629751e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.873817e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.154435e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.477076e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.848036e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 3.274549e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 3.764936e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 4.328761e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 4.977024e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 5.722368e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 6.579332e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 7.564633e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 8.697490e-04 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.000000e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.149757e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.321941e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.519911e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.747528e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.009233e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.310130e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 2.656088e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 3.053856e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 3.511192e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 4.037017e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 4.641589e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 5.336699e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 6.135907e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 7.054802e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 8.111308e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 9.326033e-03 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.072267e-02 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.232847e-02 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.417474e-02 0.09466659 7.913886e-01 0.07970115
## 0.1111111 1.629751e-02 0.09466780 7.913829e-01 0.07970231
## 0.1111111 1.873817e-02 0.09406781 7.939295e-01 0.07906104
## 0.1111111 2.154435e-02 0.09275503 7.995080e-01 0.07820248
## 0.1111111 2.477076e-02 0.09154961 8.045534e-01 0.07715637
## 0.1111111 2.848036e-02 0.09006312 8.108477e-01 0.07588037
## 0.1111111 3.274549e-02 0.08806350 8.194578e-01 0.07426600
## 0.1111111 3.764936e-02 0.08609023 8.281172e-01 0.07251736
## 0.1111111 4.328761e-02 0.08409438 8.371786e-01 0.07089010
## 0.1111111 4.977024e-02 0.08165317 8.482474e-01 0.06885187
## 0.1111111 5.722368e-02 0.07933655 8.589022e-01 0.06671436
## 0.1111111 6.579332e-02 0.07754002 8.678079e-01 0.06486101
## 0.1111111 7.564633e-02 0.07643035 8.744788e-01 0.06377993
## 0.1111111 8.697490e-02 0.07628704 8.777156e-01 0.06349323
## 0.1111111 1.000000e-01 0.07666651 8.795630e-01 0.06369508
## 0.1111111 1.149757e-01 0.07748624 8.806581e-01 0.06417857
## 0.1111111 1.321941e-01 0.07879074 8.807931e-01 0.06499926
## 0.1111111 1.519911e-01 0.08072240 8.799746e-01 0.06659153
## 0.1111111 1.747528e-01 0.08338712 8.784018e-01 0.06911773
## 0.1111111 2.009233e-01 0.08676803 8.760690e-01 0.07209886
## 0.1111111 2.310130e-01 0.09095857 8.731116e-01 0.07554846
## 0.1111111 2.656088e-01 0.09566954 8.702805e-01 0.07903124
## 0.1111111 3.053856e-01 0.10114865 8.670054e-01 0.08279072
## 0.1111111 3.511192e-01 0.10749179 8.627161e-01 0.08696117
## 0.1111111 4.037017e-01 0.11479004 8.571587e-01 0.09165005
## 0.1111111 4.641589e-01 0.12313698 8.498816e-01 0.09735101
## 0.1111111 5.336699e-01 0.13235739 8.414715e-01 0.10443549
## 0.1111111 6.135907e-01 0.14240153 8.316350e-01 0.11238710
## 0.1111111 7.054802e-01 0.15325976 8.181276e-01 0.12262675
## 0.1111111 8.111308e-01 0.16477725 7.961496e-01 0.13338284
## 0.1111111 9.326033e-01 0.17630879 7.604791e-01 0.14400843
## 0.1111111 1.072267e+00 0.18838300 6.755545e-01 0.15502045
## 0.1111111 1.232847e+00 0.20007016 4.168036e-01 0.16596409
## 0.1111111 1.417474e+00 0.20905449 2.106954e-01 0.17489006
## 0.1111111 1.629751e+00 0.21368514 9.723520e-01 0.17939599
## 0.1111111 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.1111111 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.000000e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.149757e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.321941e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.519911e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.747528e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.009233e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.310130e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.656088e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 3.053856e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 3.511192e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 4.037017e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 4.641589e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 5.336699e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 6.135907e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 7.054802e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 8.111308e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 9.326033e-05 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.072267e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.232847e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.417474e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.629751e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.873817e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.154435e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.477076e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.848036e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 3.274549e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 3.764936e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 4.328761e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 4.977024e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 5.722368e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 6.579332e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 7.564633e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 8.697490e-04 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.000000e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.149757e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.321941e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.519911e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 1.747528e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.009233e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.310130e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 2.656088e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 3.053856e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 3.511192e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 4.037017e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 4.641589e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 5.336699e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 6.135907e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 7.054802e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 8.111308e-03 0.09884582 7.735941e-01 0.08062837
## 0.2222222 9.326033e-03 0.09822180 7.765436e-01 0.08003583
## 0.2222222 1.072267e-02 0.09711254 7.814421e-01 0.07933820
## 0.2222222 1.232847e-02 0.09592376 7.863048e-01 0.07896614
## 0.2222222 1.417474e-02 0.09462670 7.915453e-01 0.07838232
## 0.2222222 1.629751e-02 0.09354132 7.960201e-01 0.07783544
## 0.2222222 1.873817e-02 0.09216651 8.020718e-01 0.07706090
## 0.2222222 2.154435e-02 0.08991397 8.121712e-01 0.07529767
## 0.2222222 2.477076e-02 0.08690519 8.255894e-01 0.07358135
## 0.2222222 2.848036e-02 0.08356190 8.400078e-01 0.07107495
## 0.2222222 3.274549e-02 0.08049970 8.533058e-01 0.06834742
## 0.2222222 3.764936e-02 0.07840509 8.627716e-01 0.06615961
## 0.2222222 4.328761e-02 0.07697346 8.699893e-01 0.06447331
## 0.2222222 4.977024e-02 0.07602073 8.757094e-01 0.06307887
## 0.2222222 5.722368e-02 0.07545596 8.803035e-01 0.06224365
## 0.2222222 6.579332e-02 0.07517599 8.846350e-01 0.06189166
## 0.2222222 7.564633e-02 0.07563710 8.879251e-01 0.06251447
## 0.2222222 8.697490e-02 0.07664953 8.909202e-01 0.06349420
## 0.2222222 1.000000e-01 0.07840219 8.928766e-01 0.06491407
## 0.2222222 1.149757e-01 0.08128399 8.930413e-01 0.06724915
## 0.2222222 1.321941e-01 0.08505626 8.922633e-01 0.07027247
## 0.2222222 1.519911e-01 0.08987551 8.897369e-01 0.07389315
## 0.2222222 1.747528e-01 0.09568409 8.855961e-01 0.07801377
## 0.2222222 2.009233e-01 0.10247038 8.804368e-01 0.08256751
## 0.2222222 2.310130e-01 0.11015507 8.743609e-01 0.08754321
## 0.2222222 2.656088e-01 0.11910457 8.656263e-01 0.09385372
## 0.2222222 3.053856e-01 0.12926930 8.528901e-01 0.10125731
## 0.2222222 3.511192e-01 0.14026323 8.372450e-01 0.10991550
## 0.2222222 4.037017e-01 0.15195421 8.210053e-01 0.12070769
## 0.2222222 4.641589e-01 0.16523701 7.921454e-01 0.13316951
## 0.2222222 5.336699e-01 0.17939198 7.346529e-01 0.14638204
## 0.2222222 6.135907e-01 0.19322533 5.987636e-01 0.15945254
## 0.2222222 7.054802e-01 0.20472356 1.453610e-01 0.17072854
## 0.2222222 8.111308e-01 0.21251932 8.504056e-01 0.17803485
## 0.2222222 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.2222222 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.000000e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.149757e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.321941e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.519911e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.747528e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.009233e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.310130e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.656088e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 3.053856e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 3.511192e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 4.037017e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 4.641589e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 5.336699e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 6.135907e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 7.054802e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 8.111308e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 9.326033e-05 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.072267e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.232847e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.417474e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.629751e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.873817e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.154435e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.477076e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.848036e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 3.274549e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 3.764936e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 4.328761e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 4.977024e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 5.722368e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 6.579332e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 7.564633e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 8.697490e-04 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.000000e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.149757e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.321941e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.519911e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 1.747528e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.009233e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.310130e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 2.656088e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 3.053856e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 3.511192e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 4.037017e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 4.641589e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 5.336699e-03 0.10078719 7.650233e-01 0.07985358
## 0.3333333 6.135907e-03 0.10019899 7.678172e-01 0.07947908
## 0.3333333 7.054802e-03 0.09888054 7.736083e-01 0.07922113
## 0.3333333 8.111308e-03 0.09759253 7.790810e-01 0.07904777
## 0.3333333 9.326033e-03 0.09662224 7.830335e-01 0.07933447
## 0.3333333 1.072267e-02 0.09557528 7.873911e-01 0.07971385
## 0.3333333 1.232847e-02 0.09475313 7.909117e-01 0.08022132
## 0.3333333 1.417474e-02 0.09314545 7.980440e-01 0.07978318
## 0.3333333 1.629751e-02 0.09053151 8.097817e-01 0.07832616
## 0.3333333 1.873817e-02 0.08707473 8.248804e-01 0.07569657
## 0.3333333 2.154435e-02 0.08335149 8.408268e-01 0.07237416
## 0.3333333 2.477076e-02 0.08007115 8.545644e-01 0.06895029
## 0.3333333 2.848036e-02 0.07809614 8.633858e-01 0.06661035
## 0.3333333 3.274549e-02 0.07680184 8.702709e-01 0.06465946
## 0.3333333 3.764936e-02 0.07579463 8.759160e-01 0.06321722
## 0.3333333 4.328761e-02 0.07480540 8.820750e-01 0.06177483
## 0.3333333 4.977024e-02 0.07444871 8.869824e-01 0.06085469
## 0.3333333 5.722368e-02 0.07470184 8.910076e-01 0.06098980
## 0.3333333 6.579332e-02 0.07589806 8.935733e-01 0.06221789
## 0.3333333 7.564633e-02 0.07841434 8.932608e-01 0.06442686
## 0.3333333 8.697490e-02 0.08201128 8.911270e-01 0.06733159
## 0.3333333 1.000000e-01 0.08646014 8.879476e-01 0.07072652
## 0.3333333 1.149757e-01 0.09167853 8.841598e-01 0.07464349
## 0.3333333 1.321941e-01 0.09781405 8.793378e-01 0.07902330
## 0.3333333 1.519911e-01 0.10513205 8.725546e-01 0.08396438
## 0.3333333 1.747528e-01 0.11356637 8.637979e-01 0.09001753
## 0.3333333 2.009233e-01 0.12263188 8.555560e-01 0.09628505
## 0.3333333 2.310130e-01 0.13282694 8.463458e-01 0.10312746
## 0.3333333 2.656088e-01 0.14489543 8.323252e-01 0.11393905
## 0.3333333 3.053856e-01 0.15817656 8.123392e-01 0.12662431
## 0.3333333 3.511192e-01 0.17279725 7.671835e-01 0.14022615
## 0.3333333 4.037017e-01 0.18739221 6.835821e-01 0.15431971
## 0.3333333 4.641589e-01 0.20003857 4.499118e-01 0.16628092
## 0.3333333 5.336699e-01 0.21058923 4.839785e-01 0.17600757
## 0.3333333 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 0.3333333 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 0.3333333 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 0.3333333 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.3333333 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.000000e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.149757e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.321941e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.519911e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.747528e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.009233e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.310130e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.656088e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 3.053856e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 3.511192e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 4.037017e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 4.641589e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 5.336699e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 6.135907e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 7.054802e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 8.111308e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 9.326033e-05 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.072267e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.232847e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.417474e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.629751e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.873817e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.154435e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.477076e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.848036e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 3.274549e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 3.764936e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 4.328761e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 4.977024e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 5.722368e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 6.579332e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 7.564633e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 8.697490e-04 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.000000e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.149757e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.321941e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.519911e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 1.747528e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.009233e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.310130e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 2.656088e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 3.053856e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 3.511192e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 4.037017e-03 0.10187613 7.600810e-01 0.08131377
## 0.4444444 4.641589e-03 0.10129979 7.627933e-01 0.08106900
## 0.4444444 5.336699e-03 0.09985809 7.692614e-01 0.08089223
## 0.4444444 6.135907e-03 0.09810916 7.768586e-01 0.08015686
## 0.4444444 7.054802e-03 0.09709213 7.810430e-01 0.08063937
## 0.4444444 8.111308e-03 0.09585696 7.864484e-01 0.08053740
## 0.4444444 9.326033e-03 0.09477256 7.911081e-01 0.08111341
## 0.4444444 1.072267e-02 0.09377256 7.954759e-01 0.08149153
## 0.4444444 1.232847e-02 0.09157087 8.052997e-01 0.08045958
## 0.4444444 1.417474e-02 0.08857771 8.184565e-01 0.07825513
## 0.4444444 1.629751e-02 0.08498387 8.339876e-01 0.07482244
## 0.4444444 1.873817e-02 0.08128511 8.494285e-01 0.07100574
## 0.4444444 2.154435e-02 0.07877787 8.599645e-01 0.06807750
## 0.4444444 2.477076e-02 0.07716935 8.674920e-01 0.06611575
## 0.4444444 2.848036e-02 0.07583106 8.739533e-01 0.06417930
## 0.4444444 3.274549e-02 0.07441561 8.812169e-01 0.06216941
## 0.4444444 3.764936e-02 0.07364987 8.871083e-01 0.06069637
## 0.4444444 4.328761e-02 0.07400309 8.903419e-01 0.06031029
## 0.4444444 4.977024e-02 0.07583102 8.899883e-01 0.06157279
## 0.4444444 5.722368e-02 0.07860922 8.880001e-01 0.06379943
## 0.4444444 6.579332e-02 0.08217681 8.851279e-01 0.06677378
## 0.4444444 7.564633e-02 0.08621467 8.820433e-01 0.06980842
## 0.4444444 8.697490e-02 0.09085577 8.790785e-01 0.07333704
## 0.4444444 1.000000e-01 0.09657439 8.746014e-01 0.07740811
## 0.4444444 1.149757e-01 0.10304072 8.700372e-01 0.08223201
## 0.4444444 1.321941e-01 0.11029859 8.662942e-01 0.08748956
## 0.4444444 1.519911e-01 0.11909341 8.612336e-01 0.09330699
## 0.4444444 1.747528e-01 0.12948611 8.542171e-01 0.10048383
## 0.4444444 2.009233e-01 0.14132319 8.442903e-01 0.11098924
## 0.4444444 2.310130e-01 0.15485116 8.228920e-01 0.12376873
## 0.4444444 2.656088e-01 0.16948294 7.865849e-01 0.13768361
## 0.4444444 3.053856e-01 0.18367436 7.337676e-01 0.15127172
## 0.4444444 3.511192e-01 0.19759600 5.368501e-01 0.16398502
## 0.4444444 4.037017e-01 0.21006434 3.001946e-01 0.17523664
## 0.4444444 4.641589e-01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 5.336699e-01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.4444444 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.000000e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.149757e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.321941e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.519911e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.747528e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.009233e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.310130e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.656088e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 3.053856e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 3.511192e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 4.037017e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 4.641589e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 5.336699e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 6.135907e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 7.054802e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 8.111308e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 9.326033e-05 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.072267e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.232847e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.417474e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.629751e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.873817e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.154435e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.477076e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.848036e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 3.274549e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 3.764936e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 4.328761e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 4.977024e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 5.722368e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 6.579332e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 7.564633e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 8.697490e-04 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.000000e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.149757e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.321941e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.519911e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 1.747528e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.009233e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.310130e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 2.656088e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 3.053856e-03 0.10203916 7.596293e-01 0.08196821
## 0.5555556 3.511192e-03 0.10202025 7.597784e-01 0.08196503
## 0.5555556 4.037017e-03 0.10115002 7.638500e-01 0.08186834
## 0.5555556 4.641589e-03 0.09966528 7.705344e-01 0.08141440
## 0.5555556 5.336699e-03 0.09801545 7.776418e-01 0.08144020
## 0.5555556 6.135907e-03 0.09643045 7.843329e-01 0.08114494
## 0.5555556 7.054802e-03 0.09501010 7.903305e-01 0.08092887
## 0.5555556 8.111308e-03 0.09389336 7.951196e-01 0.08126808
## 0.5555556 9.326033e-03 0.09211737 8.030030e-01 0.08073272
## 0.5555556 1.072267e-02 0.08943746 8.147392e-01 0.07879214
## 0.5555556 1.232847e-02 0.08626179 8.284253e-01 0.07600788
## 0.5555556 1.417474e-02 0.08270801 8.434216e-01 0.07266230
## 0.5555556 1.629751e-02 0.07964957 8.558466e-01 0.06947575
## 0.5555556 1.873817e-02 0.07756835 8.645515e-01 0.06736981
## 0.5555556 2.154435e-02 0.07612047 8.711480e-01 0.06551994
## 0.5555556 2.477076e-02 0.07460309 8.780236e-01 0.06341290
## 0.5555556 2.848036e-02 0.07321595 8.852317e-01 0.06118777
## 0.5555556 3.274549e-02 0.07310118 8.892109e-01 0.06041040
## 0.5555556 3.764936e-02 0.07466524 8.889747e-01 0.06148892
## 0.5555556 4.328761e-02 0.07712581 8.870748e-01 0.06335138
## 0.5555556 4.977024e-02 0.08025445 8.844418e-01 0.06548466
## 0.5555556 5.722368e-02 0.08373805 8.816299e-01 0.06753122
## 0.5555556 6.579332e-02 0.08772953 8.788041e-01 0.06974201
## 0.5555556 7.564633e-02 0.09241568 8.759303e-01 0.07320786
## 0.5555556 8.697490e-02 0.09774530 8.740699e-01 0.07759985
## 0.5555556 1.000000e-01 0.10413096 8.723209e-01 0.08237496
## 0.5555556 1.149757e-01 0.11195358 8.706161e-01 0.08837128
## 0.5555556 1.321941e-01 0.12124186 8.675884e-01 0.09513259
## 0.5555556 1.519911e-01 0.13218480 8.612045e-01 0.10408594
## 0.5555556 1.747528e-01 0.14514044 8.458722e-01 0.11575051
## 0.5555556 2.009233e-01 0.15921585 8.245187e-01 0.12905247
## 0.5555556 2.310130e-01 0.17353621 7.887863e-01 0.14261476
## 0.5555556 2.656088e-01 0.18851015 6.945366e-01 0.15598606
## 0.5555556 3.053856e-01 0.20307656 2.418466e-01 0.16874798
## 0.5555556 3.511192e-01 0.21439219 9.790342e-01 0.18002653
## 0.5555556 4.037017e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 4.641589e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 5.336699e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.5555556 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.000000e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.149757e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.321941e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.519911e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.747528e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.009233e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.310130e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.656088e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 3.053856e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 3.511192e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 4.037017e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 4.641589e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 5.336699e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 6.135907e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 7.054802e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 8.111308e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 9.326033e-05 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.072267e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.232847e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.417474e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.629751e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.873817e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.154435e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.477076e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.848036e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 3.274549e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 3.764936e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 4.328761e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 4.977024e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 5.722368e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 6.579332e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 7.564633e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 8.697490e-04 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.000000e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.149757e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.321941e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.519911e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 1.747528e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.009233e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.310130e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 2.656088e-03 0.10286146 7.560242e-01 0.08439216
## 0.6666667 3.053856e-03 0.10276600 7.564709e-01 0.08446755
## 0.6666667 3.511192e-03 0.10168236 7.615915e-01 0.08392520
## 0.6666667 4.037017e-03 0.09986168 7.701284e-01 0.08296194
## 0.6666667 4.641589e-03 0.09812852 7.780124e-01 0.08276299
## 0.6666667 5.336699e-03 0.09643183 7.852371e-01 0.08220986
## 0.6666667 6.135907e-03 0.09478300 7.918502e-01 0.08165993
## 0.6666667 7.054802e-03 0.09359114 7.966812e-01 0.08180595
## 0.6666667 8.111308e-03 0.09143613 8.061540e-01 0.08066378
## 0.6666667 9.326033e-03 0.08778240 8.216200e-01 0.07703122
## 0.6666667 1.072267e-02 0.08409221 8.369628e-01 0.07327125
## 0.6666667 1.232847e-02 0.08080947 8.504886e-01 0.07033814
## 0.6666667 1.417474e-02 0.07825395 8.604469e-01 0.06786831
## 0.6666667 1.629751e-02 0.07617591 8.689533e-01 0.06595036
## 0.6666667 1.873817e-02 0.07470463 8.754612e-01 0.06422672
## 0.6666667 2.154435e-02 0.07315310 8.824822e-01 0.06191581
## 0.6666667 2.477076e-02 0.07201299 8.889075e-01 0.05982498
## 0.6666667 2.848036e-02 0.07260891 8.907503e-01 0.06015661
## 0.6666667 3.274549e-02 0.07436396 8.896837e-01 0.06172904
## 0.6666667 3.764936e-02 0.07695127 8.871357e-01 0.06382590
## 0.6666667 4.328761e-02 0.07997789 8.841624e-01 0.06594539
## 0.6666667 4.977024e-02 0.08320731 8.818088e-01 0.06791426
## 0.6666667 5.722368e-02 0.08656405 8.815388e-01 0.07029926
## 0.6666667 6.579332e-02 0.09043482 8.825748e-01 0.07289692
## 0.6666667 7.564633e-02 0.09557783 8.831619e-01 0.07661007
## 0.6666667 8.697490e-02 0.10197749 8.839165e-01 0.08142193
## 0.6666667 1.000000e-01 0.10997963 8.834545e-01 0.08728711
## 0.6666667 1.149757e-01 0.11988378 8.791672e-01 0.09506382
## 0.6666667 1.321941e-01 0.13152922 8.714260e-01 0.10526644
## 0.6666667 1.519911e-01 0.14508863 8.545876e-01 0.11697914
## 0.6666667 1.747528e-01 0.15927534 8.245926e-01 0.13013958
## 0.6666667 2.009233e-01 0.17405143 7.806792e-01 0.14361769
## 0.6666667 2.310130e-01 0.18953635 6.794742e-01 0.15683304
## 0.6666667 2.656088e-01 0.20632364 1.509370e-02 0.17133457
## 0.6666667 3.053856e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 3.511192e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 4.037017e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 4.641589e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 5.336699e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.6666667 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.000000e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.149757e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.321941e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.519911e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.747528e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.009233e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.310130e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.656088e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 3.053856e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 3.511192e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 4.037017e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 4.641589e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 5.336699e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 6.135907e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 7.054802e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 8.111308e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 9.326033e-05 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.072267e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.232847e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.417474e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.629751e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.873817e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.154435e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.477076e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.848036e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 3.274549e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 3.764936e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 4.328761e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 4.977024e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 5.722368e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 6.579332e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 7.564633e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 8.697490e-04 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.000000e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.149757e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.321941e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.519911e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 1.747528e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.009233e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.310130e-03 0.10373078 7.525113e-01 0.08642962
## 0.7777778 2.656088e-03 0.10387005 7.521376e-01 0.08682304
## 0.7777778 3.053856e-03 0.10335845 7.550034e-01 0.08710843
## 0.7777778 3.511192e-03 0.10121616 7.650295e-01 0.08584576
## 0.7777778 4.037017e-03 0.09927457 7.742784e-01 0.08489519
## 0.7777778 4.641589e-03 0.09772540 7.811446e-01 0.08416841
## 0.7777778 5.336699e-03 0.09594940 7.880172e-01 0.08303413
## 0.7777778 6.135907e-03 0.09432799 7.940034e-01 0.08279627
## 0.7777778 7.054802e-03 0.09150183 8.059531e-01 0.08098843
## 0.7777778 8.111308e-03 0.08706861 8.244560e-01 0.07623130
## 0.7777778 9.326033e-03 0.08292292 8.410662e-01 0.07200495
## 0.7777778 1.072267e-02 0.07990413 8.530535e-01 0.06894507
## 0.7777778 1.232847e-02 0.07731147 8.630115e-01 0.06648492
## 0.7777778 1.417474e-02 0.07529395 8.713675e-01 0.06461708
## 0.7777778 1.629751e-02 0.07372654 8.782753e-01 0.06298361
## 0.7777778 1.873817e-02 0.07211318 8.852636e-01 0.06054795
## 0.7777778 2.154435e-02 0.07114476 8.910697e-01 0.05888639
## 0.7777778 2.477076e-02 0.07195275 8.918009e-01 0.05966344
## 0.7777778 2.848036e-02 0.07354311 8.908306e-01 0.06108560
## 0.7777778 3.274549e-02 0.07577287 8.887848e-01 0.06295675
## 0.7777778 3.764936e-02 0.07773019 8.883833e-01 0.06482804
## 0.7777778 4.328761e-02 0.07973419 8.901851e-01 0.06631852
## 0.7777778 4.977024e-02 0.08292070 8.912854e-01 0.06859961
## 0.7777778 5.722368e-02 0.08702145 8.920214e-01 0.07134134
## 0.7777778 6.579332e-02 0.09202902 8.935336e-01 0.07474389
## 0.7777778 7.564633e-02 0.09861833 8.939180e-01 0.07929818
## 0.7777778 8.697490e-02 0.10699147 8.916625e-01 0.08555682
## 0.7777778 1.000000e-01 0.11703187 8.861067e-01 0.09436071
## 0.7777778 1.149757e-01 0.12956923 8.707442e-01 0.10492482
## 0.7777778 1.321941e-01 0.14298892 8.458000e-01 0.11619205
## 0.7777778 1.519911e-01 0.15625269 8.229310e-01 0.12838535
## 0.7777778 1.747528e-01 0.17097349 7.900335e-01 0.14101932
## 0.7777778 2.009233e-01 0.18812136 6.818371e-01 0.15545576
## 0.7777778 2.310130e-01 0.20719343 2.033927e-06 0.17181008
## 0.7777778 2.656088e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 3.053856e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 3.511192e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 4.037017e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 4.641589e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 5.336699e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.7777778 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.000000e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.149757e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.321941e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.519911e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.747528e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.009233e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.310130e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.656088e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 3.053856e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 3.511192e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 4.037017e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 4.641589e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 5.336699e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 6.135907e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 7.054802e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 8.111308e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 9.326033e-05 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.072267e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.232847e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.417474e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.629751e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.873817e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.154435e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.477076e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.848036e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 3.274549e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 3.764936e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 4.328761e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 4.977024e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 5.722368e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 6.579332e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 7.564633e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 8.697490e-04 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.000000e-03 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.149757e-03 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.321941e-03 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.519911e-03 0.10404133 7.512634e-01 0.08752398
## 0.8888889 1.747528e-03 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.009233e-03 0.10404133 7.512634e-01 0.08752398
## 0.8888889 2.310130e-03 0.10431484 7.502774e-01 0.08798592
## 0.8888889 2.656088e-03 0.10422607 7.513963e-01 0.08859809
## 0.8888889 3.053856e-03 0.10276078 7.588013e-01 0.08840761
## 0.8888889 3.511192e-03 0.10126707 7.666797e-01 0.08805645
## 0.8888889 4.037017e-03 0.09932022 7.753550e-01 0.08653286
## 0.8888889 4.641589e-03 0.09729822 7.834406e-01 0.08440000
## 0.8888889 5.336699e-03 0.09564400 7.889039e-01 0.08390129
## 0.8888889 6.135907e-03 0.09290824 8.000237e-01 0.08253924
## 0.8888889 7.054802e-03 0.08808161 8.200214e-01 0.07727409
## 0.8888889 8.111308e-03 0.08330307 8.393210e-01 0.07224292
## 0.8888889 9.326033e-03 0.07985612 8.527270e-01 0.06873524
## 0.8888889 1.072267e-02 0.07726391 8.627450e-01 0.06615281
## 0.8888889 1.232847e-02 0.07523073 8.709183e-01 0.06426625
## 0.8888889 1.417474e-02 0.07357639 8.781415e-01 0.06254690
## 0.8888889 1.629751e-02 0.07202125 8.854540e-01 0.06042395
## 0.8888889 1.873817e-02 0.07138918 8.898051e-01 0.05931484
## 0.8888889 2.154435e-02 0.07232583 8.898626e-01 0.06024077
## 0.8888889 2.477076e-02 0.07359228 8.891305e-01 0.06173877
## 0.8888889 2.848036e-02 0.07431924 8.907602e-01 0.06263995
## 0.8888889 3.274549e-02 0.07550221 8.929257e-01 0.06373269
## 0.8888889 3.764936e-02 0.07733334 8.948573e-01 0.06534854
## 0.8888889 4.328761e-02 0.08055406 8.947416e-01 0.06792456
## 0.8888889 4.977024e-02 0.08448174 8.948208e-01 0.07056033
## 0.8888889 5.722368e-02 0.08945790 8.951606e-01 0.07403831
## 0.8888889 6.579332e-02 0.09601585 8.933734e-01 0.07858244
## 0.8888889 7.564633e-02 0.10407557 8.888703e-01 0.08419719
## 0.8888889 8.697490e-02 0.11395649 8.798979e-01 0.09304657
## 0.8888889 1.000000e-01 0.12537968 8.629396e-01 0.10255123
## 0.8888889 1.149757e-01 0.13701024 8.448891e-01 0.11235708
## 0.8888889 1.321941e-01 0.15016851 8.294795e-01 0.12293473
## 0.8888889 1.519911e-01 0.16601086 7.995769e-01 0.13662751
## 0.8888889 1.747528e-01 0.18432830 7.087210e-01 0.15209795
## 0.8888889 2.009233e-01 0.20529516 4.924509e-02 0.16985297
## 0.8888889 2.310130e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 2.656088e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 3.053856e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 3.511192e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 4.037017e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 4.641589e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 5.336699e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 0.8888889 1.000000e+01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.000000e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.149757e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.321941e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.519911e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.747528e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.009233e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.310130e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.656088e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 3.053856e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 3.511192e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 4.037017e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 4.641589e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 5.336699e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 6.135907e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 7.054802e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 8.111308e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 9.326033e-05 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.072267e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.232847e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.417474e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.629751e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.873817e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.154435e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.477076e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.848036e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 3.274549e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 3.764936e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 4.328761e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 4.977024e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 5.722368e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 6.579332e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 7.564633e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 8.697490e-04 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.000000e-03 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.149757e-03 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.321941e-03 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.519911e-03 0.10383281 7.527260e-01 0.08758219
## 1.0000000 1.747528e-03 0.10383281 7.527260e-01 0.08758219
## 1.0000000 2.009233e-03 0.10402767 7.519423e-01 0.08791975
## 1.0000000 2.310130e-03 0.10461221 7.500256e-01 0.08952507
## 1.0000000 2.656088e-03 0.10360276 7.549616e-01 0.08982328
## 1.0000000 3.053856e-03 0.10231113 7.624769e-01 0.08967178
## 1.0000000 3.511192e-03 0.10071627 7.705257e-01 0.08859326
## 1.0000000 4.037017e-03 0.09876515 7.782737e-01 0.08675034
## 1.0000000 4.641589e-03 0.09754309 7.815376e-01 0.08562794
## 1.0000000 5.336699e-03 0.09480301 7.921722e-01 0.08391056
## 1.0000000 6.135907e-03 0.08979748 8.129860e-01 0.07892787
## 1.0000000 7.054802e-03 0.08490749 8.329937e-01 0.07373353
## 1.0000000 8.111308e-03 0.08063554 8.496713e-01 0.06929678
## 1.0000000 9.326033e-03 0.07783049 8.604324e-01 0.06648746
## 1.0000000 1.072267e-02 0.07582093 8.683450e-01 0.06447392
## 1.0000000 1.232847e-02 0.07390484 8.764899e-01 0.06262299
## 1.0000000 1.417474e-02 0.07231164 8.837525e-01 0.06063496
## 1.0000000 1.629751e-02 0.07201306 8.869303e-01 0.05990110
## 1.0000000 1.873817e-02 0.07246559 8.880918e-01 0.06086861
## 1.0000000 2.154435e-02 0.07298646 8.894410e-01 0.06187848
## 1.0000000 2.477076e-02 0.07402423 8.899153e-01 0.06317244
## 1.0000000 2.848036e-02 0.07510115 8.912782e-01 0.06420057
## 1.0000000 3.274549e-02 0.07642657 8.933710e-01 0.06528348
## 1.0000000 3.764936e-02 0.07859592 8.951661e-01 0.06696827
## 1.0000000 4.328761e-02 0.08167235 8.966355e-01 0.06883089
## 1.0000000 4.977024e-02 0.08576341 8.978095e-01 0.07156834
## 1.0000000 5.722368e-02 0.09156221 8.959180e-01 0.07534686
## 1.0000000 6.579332e-02 0.09916970 8.891343e-01 0.08056696
## 1.0000000 7.564633e-02 0.10875071 8.757446e-01 0.08875632
## 1.0000000 8.697490e-02 0.11946558 8.570995e-01 0.09796251
## 1.0000000 1.000000e-01 0.13020231 8.477038e-01 0.10672825
## 1.0000000 1.149757e-01 0.14338089 8.352499e-01 0.11705874
## 1.0000000 1.321941e-01 0.15934474 8.112914e-01 0.13075534
## 1.0000000 1.519911e-01 0.17824636 7.442530e-01 0.14682227
## 1.0000000 1.747528e-01 0.20049562 2.980089e-01 0.16535467
## 1.0000000 2.009233e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 2.310130e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 2.656088e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 3.053856e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 3.511192e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 4.037017e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 4.641589e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 5.336699e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 6.135907e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 7.054802e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 8.111308e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 9.326033e-01 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.072267e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.232847e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.417474e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.629751e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.873817e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 2.154435e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 2.477076e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 2.848036e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 3.274549e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 3.764936e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 4.328761e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 4.977024e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 5.722368e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 6.579332e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 7.564633e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 8.697490e+00 0.21473400 1.000000e+00 0.18063976
## 1.0000000 1.000000e+01 0.21473400 1.000000e+00 0.18063976
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0.7777778 and lambda
## = 0.02154435.
## alpha lambda
## 756 0.7777778 0.02154435
Random forest
set.seed(123)
# Define LOOCV control
control <- trainControl(
method = "LOOCV",
returnResamp = "all"
)
# Define hyperparameter grid for Random Forest
rf_grid <- expand.grid(
mtry = seq(2, ncol(data_models) - 1, by = 2)
)
# mtry the number of variables randomly sampled at each split
# Train Random Forest model with hyperparameter tuning
rf_model <- train(
mean_trans_docs ~ .,
data = data_models,
method = "rf",
preProc=c('scale','center'),
trControl = control,
tuneGrid = rf_grid,
importance = TRUE # Calculate variable importance
)
# Results
print(rf_model)## Random Forest
##
## 28 samples
## 77 predictors
##
## Pre-processing: scaled (77), centered (77)
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 27, 27, 27, 27, 27, 27, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 0.1357249 0.6924307 0.10795765
## 4 0.1228614 0.7490363 0.09861979
## 6 0.1167004 0.7499508 0.09334658
## 8 0.1129228 0.7768280 0.09042208
## 10 0.1125528 0.7677563 0.09142534
## 12 0.1126246 0.7553550 0.09243385
## 14 0.1107528 0.7581780 0.08968946
## 16 0.1091306 0.7694003 0.08881322
## 18 0.1093830 0.7640607 0.09178448
## 20 0.1067000 0.7739386 0.08760491
## 22 0.1055854 0.7761217 0.08863966
## 24 0.1056703 0.7737960 0.08727348
## 26 0.1057519 0.7750241 0.08859073
## 28 0.1032864 0.7825657 0.08630664
## 30 0.1057640 0.7698629 0.08858951
## 32 0.1060711 0.7700082 0.08921265
## 34 0.1026861 0.7856051 0.08646459
## 36 0.1068056 0.7595338 0.09075249
## 38 0.1023811 0.7835225 0.08794130
## 40 0.1041163 0.7752714 0.08871807
## 42 0.1026975 0.7794132 0.08602468
## 44 0.1050040 0.7689413 0.08972529
## 46 0.1028829 0.7786960 0.08751544
## 48 0.1020675 0.7838621 0.08706345
## 50 0.1039348 0.7764866 0.08834842
## 52 0.1044070 0.7717451 0.08901207
## 54 0.1036160 0.7752264 0.08817118
## 56 0.1027465 0.7802636 0.08709824
## 58 0.1053265 0.7693986 0.09015932
## 60 0.1054170 0.7650530 0.09004271
## 62 0.1051864 0.7620469 0.08882297
## 64 0.1037047 0.7745185 0.08828945
## 66 0.1044085 0.7684377 0.08937645
## 68 0.1053307 0.7665978 0.09024869
## 70 0.1026242 0.7809591 0.08650800
## 72 0.1012078 0.7846053 0.08721272
## 74 0.1042092 0.7720311 0.08879919
## 76 0.1033834 0.7746330 0.08915758
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 72.
## mtry
## 36 72
Gradient boosting
set.seed(123)
control <- trainControl(
method = "LOOCV",
returnResamp = "all"
)
gbm_grid <- expand.grid(
n.trees = c(50, 100, 200), # Number of trees
interaction.depth = c(1, 2, 3), # Tree depth
shrinkage = c(0.01, 0.05, 0.1), # Shrinkage
n.minobsinnode = c(1, 3, 5) # Minimum observations
)
gbm_model <- train(
mean_trans_docs ~ .,
data = data_models,
method = "gbm",
preProc = c('scale', 'center'),
trControl = control,
tuneGrid = gbm_grid,
verbose = FALSE)
print(gbm_model)## Stochastic Gradient Boosting
##
## 28 samples
## 77 predictors
##
## Pre-processing: scaled (77), centered (77)
## Resampling: Leave-One-Out Cross-Validation
## Summary of sample sizes: 27, 27, 27, 27, 27, 27, ...
## Resampling results across tuning parameters:
##
## n.trees interaction.depth shrinkage n.minobsinnode RMSE Rsquared
## 50 1 0.01 1 0.17534261 0.6221312
## 50 1 0.01 3 0.17482292 0.6443693
## 50 1 0.01 5 0.17370743 0.6454867
## 50 1 0.05 1 0.11362471 0.7726611
## 50 1 0.05 3 0.12260937 0.6899033
## 50 1 0.05 5 0.11525964 0.7725740
## 50 1 0.10 1 0.10383844 0.7700901
## 50 1 0.10 3 0.10427859 0.7596362
## 50 1 0.10 5 0.12605111 0.6537415
## 50 2 0.01 1 0.16424514 0.7260875
## 50 2 0.01 3 0.16957426 0.6780202
## 50 2 0.01 5 0.17576408 0.5679947
## 50 2 0.05 1 0.10490031 0.8248725
## 50 2 0.05 3 0.10805791 0.7813102
## 50 2 0.05 5 0.12671198 0.6960346
## 50 2 0.10 1 0.10579294 0.7473315
## 50 2 0.10 3 0.08896544 0.8718071
## 50 2 0.10 5 0.10742096 0.7456717
## 50 3 0.01 1 0.16489333 0.6983951
## 50 3 0.01 3 0.16681365 0.6941092
## 50 3 0.01 5 0.17250353 0.6436657
## 50 3 0.05 1 0.10872308 0.8044058
## 50 3 0.05 3 0.10673673 0.7712844
## 50 3 0.05 5 0.12829622 0.6600755
## 50 3 0.10 1 0.11255300 0.7206775
## 50 3 0.10 3 0.09874608 0.7821524
## 50 3 0.10 5 0.10329705 0.7801276
## 100 1 0.01 1 0.14912930 0.6941238
## 100 1 0.01 3 0.14990714 0.6940469
## 100 1 0.01 5 0.14771064 0.7127216
## 100 1 0.05 1 0.09711019 0.8062690
## 100 1 0.05 3 0.10781562 0.7503011
## 100 1 0.05 5 0.10177043 0.7941066
## 100 1 0.10 1 0.09775601 0.7894045
## 100 1 0.10 3 0.09626097 0.7934059
## 100 1 0.10 5 0.11495529 0.7139583
## 100 2 0.01 1 0.13723780 0.7718442
## 100 2 0.01 3 0.14095848 0.7577283
## 100 2 0.01 5 0.15048500 0.6587291
## 100 2 0.05 1 0.09403867 0.8312470
## 100 2 0.05 3 0.09570341 0.8046334
## 100 2 0.05 5 0.11346483 0.7355281
## 100 2 0.10 1 0.10371338 0.7535242
## 100 2 0.10 3 0.08750329 0.8645275
## 100 2 0.10 5 0.10124978 0.7706153
## 100 3 0.01 1 0.13810865 0.7335565
## 100 3 0.01 3 0.13999600 0.7390435
## 100 3 0.01 5 0.15075444 0.6662695
## 100 3 0.05 1 0.09978781 0.8028721
## 100 3 0.05 3 0.09858298 0.7899025
## 100 3 0.05 5 0.11970960 0.6824963
## 100 3 0.10 1 0.10851682 0.7410702
## 100 3 0.10 3 0.09768399 0.7813381
## 100 3 0.10 5 0.09738141 0.7925997
## 200 1 0.01 1 0.12381106 0.7450347
## 200 1 0.01 3 0.12159815 0.7646484
## 200 1 0.01 5 0.12513192 0.7310243
## 200 1 0.05 1 0.09054593 0.8234099
## 200 1 0.05 3 0.09867163 0.7898757
## 200 1 0.05 5 0.09421562 0.8134414
## 200 1 0.10 1 0.09387773 0.8049800
## 200 1 0.10 3 0.09193760 0.8113663
## 200 1 0.10 5 0.10811881 0.7451308
## 200 2 0.01 1 0.11191106 0.7903007
## 200 2 0.01 3 0.11423658 0.7805147
## 200 2 0.01 5 0.12942351 0.6939360
## 200 2 0.05 1 0.08897055 0.8468747
## 200 2 0.05 3 0.09000557 0.8243077
## 200 2 0.05 5 0.10516911 0.7668287
## 200 2 0.10 1 0.10319706 0.7558654
## 200 2 0.10 3 0.08658150 0.8680486
## 200 2 0.10 5 0.09485425 0.7979238
## 200 3 0.01 1 0.11281037 0.7756893
## 200 3 0.01 3 0.11474656 0.7665614
## 200 3 0.01 5 0.12816649 0.7093361
## 200 3 0.05 1 0.09580642 0.8155457
## 200 3 0.05 3 0.09367294 0.8057494
## 200 3 0.05 5 0.10780188 0.7423812
## 200 3 0.10 1 0.10801931 0.7439261
## 200 3 0.10 3 0.09652904 0.7864382
## 200 3 0.10 5 0.09227283 0.8091476
## MAE
## 0.14375787
## 0.14159179
## 0.14109154
## 0.09077168
## 0.09355611
## 0.09227006
## 0.07985967
## 0.08792862
## 0.10360279
## 0.13211563
## 0.13740562
## 0.14374762
## 0.08256016
## 0.09060252
## 0.10240977
## 0.08306249
## 0.07100615
## 0.08938625
## 0.13315418
## 0.13485147
## 0.14121914
## 0.09031599
## 0.08775801
## 0.10320975
## 0.09505178
## 0.08333210
## 0.08242664
## 0.11966123
## 0.11812978
## 0.11680099
## 0.07998288
## 0.08283489
## 0.08325793
## 0.07708405
## 0.08275878
## 0.09637017
## 0.10635482
## 0.11020944
## 0.11834379
## 0.07626806
## 0.08207409
## 0.09257435
## 0.08076631
## 0.06833894
## 0.08555108
## 0.10727187
## 0.10940179
## 0.11888175
## 0.08435952
## 0.08175790
## 0.09700018
## 0.09246207
## 0.08074168
## 0.07556870
## 0.09770385
## 0.09591890
## 0.09890649
## 0.07509540
## 0.07779989
## 0.07453495
## 0.07371313
## 0.07971147
## 0.09157512
## 0.08853980
## 0.09190559
## 0.10284498
## 0.07320987
## 0.07747993
## 0.08490009
## 0.08044963
## 0.06749242
## 0.08027844
## 0.09086815
## 0.09160097
## 0.10108903
## 0.08131495
## 0.07737798
## 0.08565400
## 0.09216059
## 0.07917058
## 0.07172222
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 200, interaction.depth =
## 2, shrinkage = 0.1 and n.minobsinnode = 3.
## n.trees interaction.depth shrinkage n.minobsinnode
## 71 200 2 0.1 3
Comparing predictions
# Adding benchmark (mean)
benchFit <- lm(mean_trans_docs ~ 1, data=data_models)
benchmark_pred <- predict(benchFit, data_aggr)
elastic_pred <- predict(elastic_net_model, data_aggr)
rf_pred <- predict(rf_model, data_aggr)
gbm_pred <- predict(gbm_model, data_aggr)
# Actual values
actual <- data_aggr$mean_trans_docs
# Create a data frame with all predictions and actual values
all_predictions <- data.frame(
country = data_aggr$country,
actual = actual,
benchmark = benchmark_pred,
elastic_net = elastic_pred,
random_forest = rf_pred,
gradient_boosting = gbm_pred
)
# Calculate errors for each model
all_predictions$benchmark_error <- abs(all_predictions$actual - all_predictions$benchmark)
all_predictions$elastic_error <- abs(all_predictions$actual - all_predictions$elastic_net)
all_predictions$rf_error <- abs(all_predictions$actual - all_predictions$random_forest)
all_predictions$gbm_error <- abs(all_predictions$actual - all_predictions$gradient_boosting)
# Calculate performance metrics for each model
calculate_metrics <- function(actual, predicted) {
residuals <- actual - predicted
r_squared <- 1 - sum(residuals^2) / sum((actual - mean(actual))^2)
rmse <- sqrt(mean(residuals^2))
mae <- mean(abs(residuals))
return(c(R_squared = r_squared, RMSE = rmse, MAE = mae))
}
# Get metrics for each model
benchmark_metrics <- calculate_metrics(actual, benchmark_pred)
elastic_metrics <- calculate_metrics(actual, elastic_pred)
rf_metrics <- calculate_metrics(actual, rf_pred)
gbm_metrics <- calculate_metrics(actual, gbm_pred)
# Combine metrics
model_metrics <- rbind(
"Benchmark(=Mean)" = benchmark_metrics,
Elastic_Net = elastic_metrics,
Random_Forest = rf_metrics,
Gradient_Boosting = gbm_metrics
)
# Print performance comparison
print(model_metrics)## R_squared RMSE MAE
## Benchmark(=Mean) 0.0000000 0.207064927 0.1741883440
## Elastic_Net 0.9469052 0.047712555 0.0401216524
## Random_Forest 0.9603966 0.041207185 0.0341252826
## Gradient_Boosting 0.9999631 0.001257167 0.0006068361
We are able to predict so good partially because this is a set of observations which share many characteristics among them. Due to this low number of observations we are able to run LOOCV which yields amazing results. However, we should treat these results carefully, as the limited number of observation and such high performance metrics suggest that our models are prone to overfitting. Again this is mostly due to having few and similar observations, hence we are able to predict really well among those observations, but the generizability power of our model is quite low. We would need many more and many more diverse countries in our training dataset to build a model which could be applicable even outside Europe.
# Sort predictions by actual values (descending)
sorted_predictions <- all_predictions[order(all_predictions$actual, decreasing = TRUE), ]
# Print sorted predictions
print(sorted_predictions[, c("country", "actual", "benchmark", "elastic_net", "random_forest", "gradient_boosting")])## country actual benchmark elastic_net random_forest gradient_boosting
## 9 ES 0.9033659 0.5965319 0.8524193 0.8282083 0.9033659
## 22 NL 0.8663265 0.5965319 0.8423528 0.8290363 0.8668275
## 21 MT 0.8526786 0.5965319 0.8221836 0.8097210 0.8524930
## 7 DK 0.7967742 0.5965319 0.7403161 0.7649690 0.7961726
## 6 DE 0.7906977 0.5965319 0.7787671 0.7764453 0.7906392
## 11 FR 0.7901639 0.5965319 0.7352544 0.7741639 0.7897045
## 19 LU 0.7884615 0.5965319 0.7664504 0.7757374 0.7896824
## 24 PT 0.7749726 0.5965319 0.6632592 0.7431345 0.7746441
## 16 IE 0.7324263 0.5965319 0.7264290 0.7277507 0.7322838
## 2 BE 0.7278107 0.5965319 0.7260266 0.6937198 0.7278543
## 26 SE 0.7217295 0.5965319 0.7265787 0.7414309 0.7214227
## 10 FI 0.7100793 0.5965319 0.7267072 0.7292133 0.7108800
## 12 GB 0.7014428 0.5965319 0.7396484 0.7269426 0.7011545
## 1 AT 0.6154650 0.5965319 0.6371225 0.6549824 0.6153970
## 8 EE 0.6127024 0.5965319 0.5584945 0.5627297 0.6110033
## 13 GR 0.5980498 0.5965319 0.6064764 0.5602993 0.5976448
## 4 CY 0.5388471 0.5965319 0.5638382 0.5100617 0.5386893
## 27 SI 0.5282609 0.5965319 0.5777653 0.5592061 0.5281304
## 23 PL 0.5000000 0.5965319 0.4459348 0.4496013 0.4998722
## 20 LV 0.4840614 0.5965319 0.4266426 0.4506696 0.4835733
## 17 IT 0.4769404 0.5965319 0.5510112 0.4912194 0.4765180
## 14 HR 0.4539474 0.5965319 0.5159239 0.4777817 0.4532822
## 5 CZ 0.4472252 0.5965319 0.4214334 0.4710861 0.4469851
## 18 LT 0.4265570 0.5965319 0.4801427 0.4422469 0.4264217
## 28 SK 0.2990971 0.5965319 0.3362504 0.2948503 0.2986596
## 25 RO 0.2386740 0.5965319 0.2716489 0.2808620 0.2376333
## 3 BG 0.1666667 0.5965319 0.2572985 0.2650617 0.1726353
## 15 HU 0.1594684 0.5965319 0.2065162 0.2565938 0.1595363
# Visualize predictions across countries
# Reshape the data for plotting
plot_data <- sorted_predictions %>%
select(country, actual, elastic_net, random_forest, gradient_boosting) %>%
gather(key = "model", value = "prediction", -country, -actual)
# Create plot
ggplot(plot_data, aes(x = reorder(country, -actual), y = prediction, fill = model)) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_point(aes(y = actual), color = "black", size = 2) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "Model Predictions vs Actual Values by Country",
x = "Country", y = "Proportion Supporting Trans Document Changes") +
scale_fill_brewer(palette = "Set1")# Create a scatterplot of actual vs predicted for each model
ggplot(plot_data, aes(x = actual, y = prediction, color = model)) +
geom_point(size = 3, alpha = 0.7) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
facet_wrap(~model) +
theme_minimal() +
labs(title = "Actual vs Predicted Values by Model",
x = "Actual Values", y = "Predicted Values")# Find which model performed best for each country
all_predictions$best_model <- apply(all_predictions[, c("elastic_error", "rf_error", "gbm_error")], 1,
function(x) c("Elastic Net", "Random Forest", "Gradient Boosting")[which.min(x)])
# Count how many countries each model performed best on
table(all_predictions$best_model)##
## Gradient Boosting
## 28
Checks for overfitting in GBM
# Calculate residuals
gbm_residuals <- actual - gbm_pred
# Create a dataframe
residuals_df <- data.frame(actual, gbm_residuals)
# Plot residuals
ggplot(residuals_df, aes(x = actual, y = gbm_residuals)) +
geom_point(color = "red", alpha = 0.7) +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_minimal() +
labs(title = "Residuals Plot for Gradient Boosting",
x = "Actual Values", y = "Residuals")Variables importance
Elastic Net:
# Extracting all coefficients of the best Elastic Net Model
coefficients <- coef(elastic_net_model$finalModel, s=elastic_net_model$bestTune$lambda)
# Printing all the non 0s
non0_coefficients <- data.frame(
feature = rownames(coefficients),
coefficient = as.numeric(coefficients)) |>
filter(coefficient != 0) |>
arrange(desc(abs(coefficient)))
non0_coefficients## feature coefficient
## 1 (Intercept) 5.965319e-01
## 2 mean_gender_docs 1.085275e-01
## 3 mean_roma_minority -3.258333e-02
## 4 mean_antilgbtq_rights -2.203364e-02
## 5 `friends_trans_Refusal (SPONTANEOUS)` -1.689666e-02
## 6 mean_nonEU_national 1.025104e-02
## 7 `phone_access_No telephone` -9.770012e-03
## 8 `life_sat_Not at all satisfied` -9.335941e-03
## 9 democracy_index 3.171835e-03
## 10 mean_minority_discri -3.107835e-03
## 11 `internet_use_Two or three times a week` -4.820864e-06
Random Forest:
# Plotting the top 10 most important variables
varImp(rf_model, scale = F)$importance %>%
as.data.frame() %>%
tibble::rownames_to_column("variable") %>%
arrange(desc(Overall)) %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(variable, Overall), y = Overall)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Feature Importance - Random Forest",
x = "Features", y = "Importance") +
theme_minimal()Gradient Boosting:
# Plotting the top 10 most important variables
varImp(gbm_model, scale = F)$importance %>%
as.data.frame() %>%
tibble::rownames_to_column("variable") %>%
arrange(desc(Overall)) %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(variable, Overall), y = Overall)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Feature Importance - Gradient Boosting",
x = "Features", y = "Importance") +
theme_minimal()Most important variables comments…
Social class
Although D63 of the questionnaire measures (oneself’s and one’s household’s) self-perceived social class, it is still a relevant and seemingly divisive variable. We observe a positive correlation, where support grows with increasing (perceived) social class, while explicit lack of support and no response are higher for lower-class individuals.